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Sir: 

I. REAL PARTY IN INTEREST 

The real party in interest of the present application is Ryogen LLC . Ryogen LLC is 
the owner of the present application by way of an assignment from the inventor, James W. 
Ryan of all rights, title, and interest. 

II. RELATED APPEALS AND INTERFERENCES 

There are no appeals or interferences related to the present application. 

in. STATUS OF CLAIMS 

Claims 7, 10, 15-18, 20, 24, 25, 30 and 31 stand finally rejected by the Examiner as 
noted in the Advisory Action mailed October 29, 2007. Claims 1-6, 8-9, 11, 13, 19, 21 and 
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26-29 have been canceled. Claims 12, 14, 22, 23 and 32-38 have been withdrawn. The 
rejection of claims 7, 10, 15-18, 20, 24, 25, 30 and 31 are appealed. 

IV. STATUS OF AMENDMENTS 

No amendments have been submitted subsequent to the Examiner's Final Rejection. 



V. SUMMARY OF THE CLAIMED SUBJECT MATTER 

A. Independent Claim 1 



Claim Elements 


Support in 
specification 


An isolated nucleic acid molecule 20-51039 contiguous 
nucleotides in length consisting of a reverse or forward strand of a 
region of SEQ ID NO:4 


Page 9, line 34; 
page 10, lines 22- 
26 


wherein said region is selected from the group consisting of a 5'- 
non coding region depicted in nucleotides 51039-41739 of SEQ 
ID NO:4, a 3'-non-coding region depicted in nucleotides 9503-1 
of SEQ ID NO:4, a contiguous intron-exon region between 
nucleotides 41738-9502 of SEQ ID NO:4, wherein a sequence 
segment comprising 41738-9502 of SEQ ED NO:4 encodes human 
mouse double minute 2 homolog depicted in SEQ ID NO:2, a 
contiguous exon-intron region between nucleotide 41738-9502 of 
SEQ ID NO:4, wherein a sequence segment comprising 41738- 
9502 of SEQ ID NO:4 encodes human mouse double minute 2 
homolog depicted in SEQ ID NO:2, an intron depicted in 
nucleotides 36385-40645, 36309-33127, 32994-29616, 29564- 
25577, 25507-25384, 25287-21 169, 21006-141 10, 13953-13267, 
and/or 13188-10665, a region comprising a dinucleotide of the 
following group: 41739-41738, 40645-40646, 36309-36310, 
36384-36385, 32994-32995, 33126-33127, 29564-29565, 29615- 
29616, 25507-25508, 25287-25288, 25383-25384, 25576-25577, 
21006-21007,21168-21169, 14109-14110, 13953-13954, 13266- 
13267, 13188-13189, 10664-10665 and/or 9504-9503 


Page 10: Table 2 


a transcription binding site selected from the group consisting of 

BINDING SITES huMDM2, location in SEQ ID NO:4 

AP1_C: 36-46, 2876-2886; 

AP4_Q5: 7944-7980; 

AP4_Q6: 7943-59, 8924-8940, 9294-9310; 


Page 9, line 29 to 
page 10, line 2; 
Table 3 on pages | 
10-12 
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ARNT_01: 1682-1706,2193-2217,9201-9225; 

BRN2_01: 1040-1058,7803-7821; 

CAAT_01: 3292-3306; 

CDPCR3HD_01: 6522-6540; 

CEBPB_01: 1424-1438, 3917-3931, 4178-4192, 4787- 
4801,6855-6869; 

CREL_01: 5630-5642; 

DELTAEF1_01: 83-95,6328-6340; 

FREAC7_01: 2757-2773, 5154-5170, 5823-5839; 

GATA1_04: 4846-4858,7017-7029; 

GATA1_05: 8464-8476; 

GATA2_02: 6045-6057, 6073-6085, 6142-6154; 

GATA2_03: 2489-2501, 3323-3335, 3384-3396, 
7393-7405: 

GATA3_02: 3264-3276, 6870-6882: 

GATA3_03: 40-52, 5729-5741, 6529-6541, 6874-6886, 
7041-7053,7589-7601; 

GATA_C: 7 349-7361,8188-8200; 

HFH2_01: 1743-1759,7995-8011; 

HFH3_01: 502-518, 1739-1755, 4160-4176, 9402- 
9418,9418-9434; 

HFH8_01: 8184-8200; 

IK2_01: 951-963,3588-3600; 

MZF1_01: 1202-1210, 1447-1455, 4997-4005, 5424- 
5432; 
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NF1_Q6: 1480-1500,8166-8182; 




NFAT.Q6: 4190-4208, 6009-6027; 




NKX25_01: 741-755, 1648-1662, 1885-1899, 1984- 
1998, 3609-3623, 4928-4942, 5060-5074, 5889-5903, 8850-8864, 
9190-9204; 




NKX25_02: 2584-2599, 2970-2984, 4644-4658, 5 179- 
5193, 6482-6496; 




NMYC_01: 2560-2572; 




RORA1_01: 220-238,2638-2656; 




S8_01: 4644-4656, 4842-4854, 4845-4857, 5200- 
5212, 5371-5383, 5735-5747, 6482-6494, 6541-6553, 6544-6556, 
6772-6784, 7270-7292, 7273-7285; 




SOX5_01: 1355-1371, 1430-1446,3094-3110,3155- 
3171, 4669-4685, 4692-4708, 4789-4805; 




SRY_02: 4164-4180,5665-5681; 




TATA_01: 1261-1277, 2574-2590, 2723-2739, 2733- 
2749, 2770-2786, 4199-4215, 4206-4222; 




TATA_C: 5900-5916, 7456-7472, 7702-7718, 7917- 
7933; and 




XFD2_01: 7702-7218,7917-7933; 




a transcription binding site selected from the group consisting of 

BINDING SITES huMDM2, location in SEQ ID NO:4 
AP1_C: 12109-12119, 12695-12705,22600-22610, 
24166-24176, 31311-31321, 35234-35244, 39184-39194; 




AP1_Q2: 1 1952-1 1962, 12068-12078, 14798-14808, 
21748-21758, 22613-22623, 23676-23686, 26562-26572, 30046- 
30056; 




AP1_Q4: 12695-12705,31311-31321,35234-35244, 
36295-36305, 38784-38794, 39188-39198; 




AP4_Q6: 31635-31651; 





BRN2_01: 
40027-40045; 

CAAT_01: 

CDPCR3HD_01: 
29344-29362; 

CEBPB_01: 

CREL_01: 

DELTAEF1 01: 



13448-13466, 14764-14782, 28094-28112, 

11288-11302,15054-15068; 
11286-11304, 13284-13302, 20846-20864, 

29241-29255; 

36091-36103,38873-38885; 
18083-18095, 20385-20397, 26955-26967; 



FREAC7_01: 11982-11998, 15187-15202, 16523-16539, 

16529-16545, 16587-16603, 16604-16620, 16676-16642, 16633- 
16649, 16644-16660, 16650-16666, 16657-16673, 16673-16689, 
16762-16778, 21332-21348, 25689-25700, 26529-26545, 27767- 
27783, 29495-29511; 

GATA1_02: 10916-10928, 15775-15789, 18162-18174, 

26088-26100, 32518-32530; 



GATA103: 



28012-28024; 



GATA1_04: 11153-11165, 11630-11642, 13778-13790, 

17439-17451, 19300-19312, 21606-21618, 22743-22755, 23747- 
23759, 25806-25818, 26529-26541, 29424-29436, 30455-30467, 
32761-32778, 33352-33364, 33960-33972, 36101-36113, 40007- 
40019; 



GATA1_05: 

GATA1_06: 
37855-37867; 



1 1590-1 1602, 26550-26562, 36737-36749; 
18772-18784, 23054-23066, 35568-35580, 



GATA2_02: 20755-20767, 30830-30842, 34755-34767, 

36285-36297, 39143-39155, 39641-39653, 40586-40598; 

GATA2_03: 13535-13547, 22711-22723, 23161-23173, 

25028-25040, 27237-27249, 36277-36289; 

GATA3_02: 1 1558-1 1570, 16470-16482, 17225-17237, 

19619- 19631, 22156-22168, 22443-22455, 24713-24725, 27619- 
27631, 32716-32728, 34124-34136, 34163-34175, 36832-36844, 
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38403-38415; 




GATA3_03: 10869-10881, 11515-11527, 13845-13857, 
17221-17233, 18952-18964, 20050-20062,40171-40183; 




GATA_C: 15848-15860, 18899-18911,23640-23652, 
29072-29084, 30881-30893, 33198-33210, 37472-37484, 38621- 
38633; 




GFI1_01: 35469-35481, 35492-35504; 




HFH2_01: 15939-15955, 24636-24652, 25866-25882, 
32171-32187, 35372-35388, 39457-35473; 




HFH3_01: 13340-13356, 19218-19234,21328-21344, 
21336-21352, 21344-21360, 28062-28078, 32125-32141; 




HFH8_01: 14133-14149, 22578-22584; 




HNF3B_01: 13150-13166, 16505-16521,25264-25280, 
29443-29459, 37654-37670; 




IK2_01: 11547-11559, 17144-17156, 18961-18973, 
23883-23895, 27617-27629, 28908-28920, 29241-29253, 30752- 
30764, 34768-34780; 




LYF1J31: 12319-12331,19191-19203,37226-37238, 
39430-39442; 




MAX_01: 22974-22986,33339-33351; 




MZF1_01: 26105-261 13, 35187-35195; 




NF1_Q6: 12048-12064, 33334-33354; 




NFAT_Q6: 13295-13313, 14157-14175, 14311-14329, 
14414-14432, 18269-18287. 19326-19344, 20801-20819,21177- 
21195, 22537-22555, 23861-23879, 25392-25410, 25879-25897, 
27524-27542, 30636-30654, 30718-30736, 31525-31543, 33655- 
33673, 34726-34744, 34917-34535, 34990-35008, 35979-35997, 
36479-36493, 36577-36595, 37154-37172, 40224-40242, 40365- 
40383; 




NKX25_01: 12041-12055, 12340-12354, 12471-12485, 12742- 
12756, 12877-12891, 13849-13863, 18995-19009, 21440-21454, 
21883-21897, 28426-28440, 30964-30978, 32033-32047, 32265- 
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32279; 

NKX25_02: 10998-11012, 12711-12725, 14131-14145, 14726- 
14740, 16024-16038; 

NMYC.01: 18753-18765, 18754-18766, 23076-23088, 30534- 
30546, 34400-34412; 

RORA1_01: 13134-13152, 22966-22984, 24934-24952, 33341- 
33359, 34760-34778; 

S8_01: 11000-11012, 11977-11989, 12048-12060, 12051- 

12063, 13747-13759, 13923-13935, 13926-13938, 14676-14688, 
14679-14691, 16026-16038, 16313-16325, 16316-16328, 17515- 
17527, 20756-20768, 20759-20771, 23154-23166, 23157-23169, 
25198-25210, 25201-25213, 26651-26663, 27508-27520, 27511- 
27523, 29450-29462, 29478-28490, 29775-29787, 29778-29790, 
29813-29825, 29816-29828, 31329-31341, 31677-31689, 31680- 
31692, 31732-31744, 31735-31747, 36137-36149, 36140-36152, 
36812-36824, 36815-36827, 37413-37425, 38679-38691, 39474- 
39486, 39477-39489; 

SOX5_01: 27397-27413, 27572-27588, 28100-28116, 29230- 
29246, 29439-29455, 30690-30706, 31595-31611, 33871-33887, 
34113-34129, 34624-34640, 37668-37684, 38582-38598, 39124- 
39140, 40410-40426; 

SRY_02: 20016-20032, 22410-22426, 27329-27345, 29162- 
29178, 29499-29515, 30646-30662, 31503-31519, 35928-35944, 
37324-37340; 

TATA_01: 32722-32738, 32729-32745, 32807-32823, 33825- 
33841, 34120-34136, 35433-35449, 36593-36609; 

TATA_C: 11015-11031, 11817-11833, 13635-13651, 14930- 
14946; 

TCF11_01: 18543-18549, 22574-22580, 31281-31297, 31489- 
31505, 38754-38770; 

USF_01: 23075-23087,32577-32589; 

VMYB_02: 11526-11538, 17384-17396, 18400-18412, 19549- 
19561, 22188-22200, 40486-40508 and 

XFD2_01: 16620-16636. 18153-18169, 22102-22118, 23141- 
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23157. 




And a transcription binding site selected from the group consisting 
of 




BINDING SITES huMDM2 location in SEQ ID NO:4 




AP1_C: 44584-44594, 49069-49079: 




AP1_Q2: 42174-42184, 45217-45227, 48422-48422, 
50447-50457; 




AP1_Q4: 42702-42712,50806-50816; 




AP4_Q6: 42117-42133,42118-42134, 
42244-42260, 45432-45448; 45433-45449, 
46609-46625; 




BRN2_01: 42310-42328, 44022-44040, 47514-47532, 
48900-48918, 48967-48985; 




CAAT_01: 44866-44880; 




CDPCR3HD_01: 45671-45689, 49219-49237; 




CREL_01: 42437-42449,49797-49809; 




FREAC7_01: 47026-47042, 47292-47308, 47658-47674; 




GATA1_02: 43482-43494, 48926-48938, 49284-49296; 




GATA1_03: 47371-47383; 




GATA 1_04: 43054-43066, 43 1 62-43 1 62, 43967-43979, 
45464-45476, 45916-45928, 47763-47775; 




GATA1_05: 49319-49331, 49459-49471; 




GATA1_06: 47590-47602; 




GATA2_02: 42660-42672, 43475-43487; 




GATA2_03: 43714-43726, 50948-50960; 




GATA3_02: 49155-49167, 49844-49856; 
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GATA3_03: 42202-42214, 44810-44822, 

48438-48450, 49136-49148, 49337-49349, 
49869-49881; 

GATA_C: 4401 1-44023, 45256-45268, 

45823-45835, 47915-47927, 49201-49213, 
49573-49585; 

GFI1_01: 46606-46618, 47063-47075; 

HFH3_0 1 : 47030-47046, 47284-47300, 47288-47304; 

IK2_01: 45275-45287; 

LYF1_01: 44564-44576, 46991-47003, 49567-49579; 

MAX_01: 43234-43246,48726-48738; 

MZFl.Ol: 41772-41780, 42290-42298, 42295-42303, 

44507-44515, 45105-45113, 45203-45211, 49948-49956, 
50774-50782; 

NF1_Q6: 50209-50229; 

NFAT_Q6: 42061-42079, 44418-44436, 46399-46417, 

47974.47992, 49267-49285, 49964-49982, 50392-50410; 

NKX25_01: 42394-42408, 43507-43521, 46115-46129; 

RORA1_01: 45073-45091, 48718-48736; 

S8_01: 43552-43564, 45214-45226, 47160-47172, 

48419-48431, 49295-49307, 50379-50391; 

SOX5_01: 43716-43732, 46351-46367, 47156-47172, 

47774-47790, 47868-47884, 47974-47990, 48915-48931, 50323- 
50339; 

TATA_01: 45588-45604, 47625-47641, 48026-48042, 

48659-48675, 49056-49072, 49079-49095, 49152-49168; 

TCF11_01: 49115-49131; 

VMYB_02: 42010-42022, 42279-42291, 44651-44663; 
XFD2_01: 42870-42886, 42910-42926. 
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B. Independent Claim 24 



Claim Element 


Support in Specification 


An isolated nucleic acid molecule 20-5000 contiguous 
nucleotides in length 

consisting of a reverse or forward strand of a contiguous exon 
intron region between nucleotides 41738-9502 of SEQ ID 
NO:4 or contiguous intron-exon region between nucleotides 
41738-9502 of SEQ ID NO:4, 

wherein a sequence segment comprising 41738-9502 of SEQ 
ID NO:4 encodes human mouse double minute 2 homolog 
depicted in SEQ ID NO:2 


Page 9, line 35 to page 10, 
line 2 

Page 10, Table 2 and page 
14, lines 29-32 

Page 2, lines 7-13; page 10, 
Table 2 



VI. Grounds of Rejection to Be Reviewed on Appeal 

Whether claim 7, 10, 15-18, 20, 24, 25, 30 and 31 are unpatentable over Muzny et al. 
("Muzny") in view of Vogelstein et al. ("Vogelstein"). 



VI. Argument 

In the final rejection dated April 16, 2007, the Office Action stated: 

It would have been obvious to one of ordinary skill in the art at 
the time the invention was made to use said cDNA to identify 
the genomic DNA that encodes the human MDM2 homolog of 
SEQ ID NO:2 on chromosome 12ql2-14. The motivation is 
provided by Vogelstein et al. who teach that it binds to 
oncogene p53 and is diagnostic of tumorigenesis. The state of 
the art provides various techniques for obtaining genomic DNA 
using cDNA probes that are usually labeled. The comparison of 
genomic and cDNA would result in the identification of regions 
comprising exon-intron and intron-exon junctions within coding 

and non-coding regions. One of ordinary skill in the art would 
have been motivated to use said non-coding regions or 
fragments thereof of at least 20 nucleotides and up to 5000 or 
51039 nucleotides (the entire length of SEQ ID NO:4) 
nucleotides for detecting splice variants of chromosome 12ql2- 
14 from genomic nucleotide samples from an individual, for 
example. As a matter of convenience a non-coding region such 
as an exon-intron or intron-exon region or fragments thereof 
can be present in a kit or on a solid support. Further, said 
support can be a microarray according to a customary use of 
nucleic acid molecules in the art. 
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Appellant respectfully traverses the rejection. In Appellant's view, it would not have 
been obvious to combine the disclosure of Muzny and Vogelstein given that there was no 
suggestion to do so. Muzny merely contains just a small portion of chromosome 12 DNA. 
Chromosome 12 is about 130 million base pairs long and is believed to contain several 
hundred genes (by analysis after 2001 and after the Applicant discovered the human MDM2 
homologue gene). Muzny et al knew that clone AC025423 (from 1 VI 1-61 102) was from 
chromosome 12 but there is no evidence in the NCBI report of a sub-assignment to the p- or 
q-arm. Further, there is no evidence that Muzny knew whether the clone did or did not 
contain one or more genes and particularly whether it contained the gene encoded by SEQ ID 
NO:4. As will be discussed in further detail below, the MDM2 cDNA constitutes just 1.6% 
of the clone disclosed by Muzny. Undue experimentation would have been required not only 
to locate the MDM2 gene but also identify exon-intron junctions. 

Even assuming arguendo that there had been such a motivation, one of ordinary skill 
in the art would not have obtained the isolated nucleic acid molecules of the present 
invention, the specifically recited non-coding regions of the MDM2 gene located between 
nucleotides 20-51039 of SEQ ID NO:4. This is because, as noted in previous responses, 
Vogelstein placed the human MDM2 homologue gene at 12ql2-14. Actually, this finding is 
incorrect. After the publication of Vogelstein, the gene was found not to be located at 12ql2- 
14. The gene is actually several millions of base pairs away at 12ql5 (see, for example, 
Andersen et al., 1996, Mammalian Genome 7:780-783, Bureau, 1995, Genomics 28: 109-112 
and Genecard, attached hereto as Exhibit 1 and previously made of record). Thus, even if 
these two references were indeed combined, the ordinary skilled artisan would have looked 
for the MDM2 gene in the wrong location and thus would not have obtained the claimed 
sequences. 

In response to Remarks made by Appellant in the amendment submitted on January 

23, 2007, the final Office Action states 

With regard to the 103(a) rejection, Applicant argues that 
"There was certainly no indication given in the cited art either 
singly or in combination regarding the location of the MDM2 
gene encoding human mouse double minute 2 homolog 
depicted in SEQ ID NO:2 on AC025423" (Remarks, page 13). 
As was explained in the previous Office action mailed August 
25, 2006 "the exact location of the gene is not necessary as long 
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as its sequence is known as in the instant case" (page 5). At the 
time the invention was made finding non-coding regions using 
cDNA and genomic DNA was standard technique. Watson et al 
(1992) "Recombinant DNA" teach that once the first genes 
were cloned, introns were identified by comparing the cloned 
genomic DNA with the corresponding cloned cDNA (page 137, 
2nd column). In the case of the instant application, both 
genomic and cDNA were known. They only needed to be 
compared in order to identify intron-exon junctions. Applicant 
further argues that "There is no prior art that defines the 
complete genomic structure of a particular gene. This is 
necessary in order to identify the claimed noncoding sequences 
in the instant invention" (page 13). This argument is similar to 
the issue of location and is responded above. Applicants further 
argues that "One of ordinary skill in the art would have no idea 
as to the number of introns and the length of the 5' and 3' 
noncoding sequences in the MDM2 gene" (page 14). It is 
agreed that said number and length were not known before the 
invention. If they were known, the rejection would be 102. The 
current rejection is 103(a) stating that it would have been 
obvious to compare genomic DNA and cDNA and identify the 
number of introns and the length of the 5' and 3' noncoding 
sequences in the MDM2 gene. Applicant further argues "The 
Examiner's assertion that one of ordinary skill in the art would 
have expected that the location is often imprecise actually 
further supports Applicant's assertion the claimed sequences 
were indeed nonobvious. If the location is imprecise, where 
would one of ordinary skill in the art know where to look?" 
(page 15). This is not persuasive because the precise location is 
not necessary when the 2 sequences that need to be compared 
are known. They would be obvious because the genomic DNA 
was already sequenced and cDNA was made. Applicant further 
argues that "It should be noted that annotation of the human 
genomic DNA was still relatively new as of the priority date of 
the instant application. Even assuming arguendo that finding 
noncoding regions using cDNA and genomic DNA was 
standard technique, the means to make the invention does not 
predict the claimed invention. Specifically the means used to 
make the invention do not predict the claimed nucleic acid 
molecules. BLASTN, TBLASTN, etc. do not themselves 
predict gene-specific results. It is Applicant's view that only 
general guidance is provided. This is not sufficient" (page 16). 
This is not persuasive because there is no need to predict the 
sequence itself, it was known before. The invention is in the 
identification of the specific fragments (exons/introns) of the 
known sequence. Applicant appears to argue as if the genomic 
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DNA was not sequenced prior to the instant invention. 
Applicant further argues that "it is Applicant's view that given 
that the cDNA constitutes just 1.6% of the AC025423 sequence 
is in itself evidence of the unpredictability of determining the 
entire sequence of the MDM2 gene and thus contiguous intron- 
exon and exon-intron regions. The Examiner is in effect 
asserting that just because Applicant did isolate the claimed 
nucleic acid molecule, it must have been obvious to do so. It is 
well established case law that the fact that the inventors were 
ultimately successful is irrelevant to whether one of ordinary 
skill in the art at the time the invention was made would have 
reasonably expected success" (paragraph bridging pages 16-17). 
This is not persuasive because the size of the cDNA does not 
matter. ESTs of smaller size are used for comparison to 
genomic sequences. Applicant does not show what unexpected 
difficulties other than routine comparison of the genomic DNA 
and cDNA were encountered during the time the invention was 
made. 

It appears that the Examiner is asserting that just because a genomic clone containing 
chromosome 12 sequences had been isolated, the MDM2 cDNA was known and that 
techniques were known for finding noncoding regions, one of ordinary skill in the art would 
have had a reasonable expectation of success of obtaining the claimed sequences. Further, the 
Examiner is asserting that it is not significant that cDNA just constitutes just 1.6% of the 
AC025423 sequence. Appellant disagrees for a number of reasons. 

First, Appellant asserts that the Watson disclosure is of limited relevance here. 

Specifically, Watson specifically states on page 137 

It should be noted that at the time that the electron microscopy 
experiments on adenovirus were done, no one had clone a 
cellular gene yet. Once the first genes were cloned, introns 
were identified by comparing the cloned genomic DNA with the 
corresponding cloned cDNA. 

This is very different from the situation with respect to the isolated non-coding fragments of 

the instant invention. Muzny merely discloses the sequence of a genomic clone containing 

chromosome 12 sequences, not isolated SEQ ID NO:4. No indication is provided is provided 

in the Muzny disclosure as to which portion of chromosome 12 has actually been sequenced. 

Vogelstein merely discloses the MDM2 cDNA. Applicant was the first to identify SEQ ID 

NO:4 and determine that it did indeed encode MDM2 and in particular determine the claimed 

noncoding sequences. 
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Second, Appellant asserts that there would not be a reasonable expectation of success 
of obtaining the claimed noncoding sequences of SEQ ID NO:4 in view of the cited 
references. Vogelstein placed the human MDM2 homologue gene at 12q 12-14. As noted 
above, there was actually a previous disclosure stating that the MDM2 was located between 
12ql4.3-15 (see, for example, Andersen et al., 1996, Mammalian Genome 7:780-783 and 
Bureau, 1995, Genomics 28: 109-112, submitted and disclosed in previous response attached 
hereto as Exhibit 1). However, given the conflicting locations published as of the priority 
date of the instant application, one of ordinary skill in the art would not have known which 
location was actually correct. This situation in Appellant's view would constitute undue 
experimentation. 

Third, Appellant disagrees with the assertion that the size of the cDNA does not matter 
since ESTs of smaller size are used for comparison to genomic sequences. In Appellant's 
view, the size of the cDNA relative to the isolated genomic clone is of particular significance. 
If, for example, the cDNA constituted 50% of a particular genomic clone, considerable less 
experimentation would be involved in determining the sequence of a particular gene than 
when it is merely 1.6% of the genomic clone. 



B. Advisory Action 

The Advisory Action states: 

Applicant reiterates the previous arguments that have been 
answered in the Final Office Action mailed 4/16/07. As 
correctly noted by Applicant, the examiner asserts that if 
genomic and cDNA sequences are known, one of ordinary skill 
in the art would have a reasonable expectation of success of 
obtaining the claimed sequences (Remarks, page 15). Applicant 
does not explain why the fact that cDNA constitutes 1.6% of the 
genomic sequence leads to no expectation of success. 
Applicant further discusses KSR v Teleflex case, stating that 
"under KSR, an approach that is obvious to try is also obvious 
where normal trial and error procedures will lead to the result" 
(page 16). It appears that KSR supports the 103(a) rejection 
because Applicant does not show why the result wouldn't be 
obtained. 

In response, it has throughout prosecution been asserted that the fact that the MDM2 
cDNA constitutes just 1.6% of the genomic sequence disclosed by Muzny is very significant 
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since in Appellant's view, undue experimentation would be involved to isolate the claimed 
sequence. More than trial and error would be involved. 

Muzny only disclosed the sequence of the clone AC025423. There was no indication 
at the time of the filing date of the instant application that AC025423 actually contained SEQ 
ID NO:4, more specifically a sequence encoding human MDM2. Muzny did not sequence all 
of chromosome 12; they only sequenced the chromosome 12 BAC clone AC025423, less than 
0.12% of the chromosome (about 157 kB vs. about 132,000 Kb). The clone's sub-site on 
chromosome 12 apparently was not then known. Muzny did not even suggest, must less state 
whether the AC025423 did or did not contain any gene. The determination of a sequence of 
genomic DNA does not necessarily imply the presence of a gene. This is because it is well 
known in the art and was certainly well known when the instant invention was made and as of 
the filing date of the instant application that most genomic DNA contains "junk" DNA, not 
genes. Appellant attaches hereto as Exhibit 2 the following review article (Wong et al., 2000, 
"Is Junk DNA Mostly Intron DNA?" Genome Research 10: 1672-1678) which discuss "junk" 
DNA in further detail. As is evident from this review article, junk DNA was well known as 
of the priority date of the above-referenced application. When obtaining genomic clone 
sequences, one of skill in the art would not have any way of knowing whether or not it 
actually contains a gene(s) or just junk DNA. Even noting that a clone has a high GC content 
is a poor guide to a sequence's likely gene content, especially given the presence of 
pseudogenes. Therefore, contra to assertions made in the Office Action, disclosure of this 
particular sequence would not indicate to the skilled artisan that this clone would necessarily 
contain a gene. 

The clone AC025423 is 150,579 nucleotides in length. The cDNA sequence only 
contains 2372 nucleotides (1.6% of AC025423). However, one of ordinary skill in the art 
would not know where or how these 2372 nucleotides are interspersed within the AC025423 
clone or if indeed it is even present. No teachings are provided as to the structure of the 
MDM2 gene itself: number and size of exons, number and size of introns, locations of exons 
and introns and number and size of 5' and 3' untranslated regions. The possibilities are close 
to infinite. Thus there would not be a reasonable expectation of success of obtaining the 
claimed sequences given that the isolation and identification of the claimed sequences 
constitutes undue experimentation. 
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It is Appellant's view that the combination of the two references would still constitute 

"obvious to try" even under KSR v Teleflex. KSR v Teleflex, 82 USPQ2d 1385; 127 S.Ct. 

1727 (2007). Appellant notes that the "Examination Guidelines for Determining Obviousness 

Under 35 U.S.C. 103 in View of the Supreme Court Decision in KSR International Co. v. 

Teleflex", 72 FR 57526 (October 10, 2007) (hereinafter "Examination Guidelines") stated 

with respect to "obvious to try" the following: 

To reject a claim based on this rationale, Office personnel must 
resolve the Graham factual inquiries. Office personnel must 
then articulate the following: 

(1) a finding that at the time of the invention, there had 
been a recognized problem or need in the art, which may 
include a design need or market pressure to solve a problem; 

(2) a finding that there had been a finite number of 
identified, predictable potential solutions to the recognized need 
or problem; 

(3) a finding that one of ordinary skill in the art could 
have pursued the known potential solutions with a reasonable 
expectation of success; and 

(4) whatever additional findings based on the Graham 
factual inquiries may be necessary, in view of the facts of the 
case under consideration, to explain a conclusion of 
obviousness. 

Ex parte Kubin, 83 USPQ2d 1410, 2007 WL 2070495 (Bd. App. & Int. 2007) was 

cited as an example in the Examination Guidelines of a situation where a finite number of 

identified, predictable solutions are provided with a reasonable expectation of success. In 

Kubin, the claimed invention was directed to 

An isolated nucleic acid molecule comprising a polynucleotide 
encoding a polypeptide at least 80% identical to amino acids 
22-221 of SEQ ID NO:2, wherein the polypeptide binds CD48. 

A* at 1412. 

In the rejection affirmed by the Board, the Examiner asserted that 

The skilled artisan would have been motivated to isolate the 
nucleic acid sequence corresponding to NAIL, based on 
Valiante's disclosure of p38 (which is the same protein as 
NAIL) and Valiante's express teachings how to isolate p38 
cDNA by using conventional techniques, such as taught in 
Sambrook, including using mAbC1.7, a probe specific for p38. 
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Id at 1412. 



This finding of obviousness in Kubin appeared to be predicated on the obviousness of 

isolating NAIL cDNA, not any other nucleic acid sequences encoding the NAIL protein. The 

Board specifically states 

Based on our findings and those of the Examiner, at least one of 
Appellants 5 claimed polynucleotides would have been obvious 
to one of ordinary skill in the art at the time Appellants' 
invention was made 

Appellants argue the "cited references do not provide an 
adequate written description of the claimed nucleic acid 
sequences." (citation omitted). In so arguing, Appellants 
overlook the distinction between obviousness under §103 and 
lack of written description under §1 12, § 1. A single obvious 
species within a claimed genus renders the claimed genus 
unpatentable under §103. Thus, an obvious method of 
obtaining a single nucleic acid molecule encoding NAIL may be 
all that is required to show that the presently claimed genus of 
nucleic acid molecules is unpatentable under §103 

The "problem" facing those in the art was to isolate NAIL 
cDNA, and there were a limited number of methodologies 
available to do so. The skilled artisan would have had reason to 
try these methodologies with the reasonable expectation that at 
least one would be successful. 

Id at 1414. 

The claimed invention is directed to a nucleic acid molecule 20-51039 contiguous 
nucleotides in length consisting of a reverse or forward strand of a region of SEQ ID NO:4. 
SEQ ID NO:4 is a genomic sequence, not cDNA encoding MDM2. This is very different from 
Kubin which involves deducing the NAIL cDNA sequence from NAIL polypeptide sequence 
and subsequently isolating this sequence. The art cited in the instant application was a cDNA 
sequence and a genomic clone containing chromosome 12 sequence. Considerably more and 
undue experimentation would be involved in identifying and isolating the genomic sequence 
and consequently the claimed regions comprising the recited noncoding regions of the MDM2 
gene than in identifying cDNA in view of a disclosure of a polypeptide sequence. The 
claimed sequence is between 20-51039 nucleotides in length. The Muzny sequence is 
150,579 nucleotides in length. No direction or frame of reference is provided as to where the 
MDM2 sequence could be located. Furthermore, although Vogelstein, disclosed the MDM2 
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cDNA, no direction is provided as to the location of exons, introns, 5' untranslated and/or 3' 

untranslated regions. The MDM2 cDNA was only 2372 nucleotides in length. Thus, there 

would have been a huge number of strategies and methodologies that one of ordinary skill in 

the art could have employed in attempting to obtain the claimed sequences. 

Appellant also notes that in the Examination Guidelines it is stated 

The key to supporting any rejection under 35 U.S.C. 103 is the 
clear articulation of the reason(s) why the claimed invention 
would have been obvious. The Supreme Court in KSR noted 
that the analysis supporting a rejection under 35 U.S.C. 103 
should be made explicit. The Court quoting In re Kahn stated 
that ""'[Rejections on obviousness cannot be sustained by 
mere conclusory statements; instead, there must be some 
articulated reasoning with some rational underpinning to 
support the legal conclusion of obviousness."' 

This statement follows the holding of established case law, most notably In re Beasley 117 

Fed. Appx. 739, 2004 WL 2793170 (C.A. Fed 2004). In Beasley, the Court held that: 

The examiner and the Board have managed to find motivation 
for substituting one type of memory for another without 
providing a citation of any relevant, identifiable source of 
information justifying such substitution. The statements made 
by the Examiner, upon which the Board relied, amount to no 
more than conclusory statements of generalized advantages and 
convenient assumptions about skilled artisans. 

A similar situation exists with respect to the instant application. An example of such 

conclusory statements made are on page 4 of the Office Action and shown in boldface: 

It would have been obvious to one of ordinary skill in the art at 
the time the invention was made to use said cDNA to identify 
the genomic DNA that encodes the human MDM2 homolog of 
SEQ ID NO:2 on chromosome 12ql2-14. The motivation is 
provide by Vogelstein et al. who teach that it binds to oncogene 
p53 and is diagnostic of tumori genesis. The state of the art 
provides various techniques for obtaining genomic DNA 
using cDNA probes that are usually labeled. The 
comparison of genomic and cDNA would result in the 
identification of regions comprising exon-intron and intron- 
exon junctions within coding and non-coding regions. 

The Examiner has only referred to Watson, 1992. As previously argued, Watson 
would not apply in this case since in Watson, the genes themselves were actually cloned. No 
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other references were provided by the Examiner in response to Appellant's subsequent 
arguments. Furthermore, there is nothing stated in the Office Action as to how the state of the 
art teaches how given the teaching of a large genomic clone and the cDNA sequence of a 
particular gene, one of ordinary skill in the art could with particularity identify specific exon 
and intron sequences of a particular gene and assemble it in its entirety. There is no prior art 
that defines the complete genomic structure of a particular gene. This is necessary in order to 
accurately identify the claimed noncoding sequences in the instant invention. 

C. Secondary Considerations 

Appellant also asserts that secondary considerations should also apply when 

determining if the claimed invention is obvious over Muzny in view of Vogelstein. One 

secondary consideration of particular relevance is long felt need. Specifically, there has been 

a great deal of interest in the scientific community in MDM2 given its potential use as a 

diagnostic and therapeutic agent. This interest is summarized in the cited patent, Vogelstein 

Additionally, Appellant submitted during prosecution and IDS listing four references relating 

to MDM2 (a copy of the 1449 Form previously submitted and made of record is attached 

hereto as Exhibit 3). However, there was absolutely no disclosure or suggestion of the 

genomic organization of MDM2 genomic DNA until the instant application was filed. An 

independent disclosure of the genomic organization of the MDM2 gene was not available 

until July 21, 2004, more than one year after the filing date of the instant application (Liang et 

al., 2004, Gene 338:217-223, submitted herewith as Exhibit 4). The dearth of knowledge 

regarding the MDM2 gene was actually admitted to by Liang et al. on the first page of his 

article where it was stated "Although the human MDM2 cDNA sequence has been reported, 

the genomic organization of the human gene has not been documented". Further, Liang et al. 

discussed the usefulness of determining the genomic structure and organization of the MDM2 

gene on page 218: 

One of the distinctive properties of MDM2 is the possession of 
an extremely complex expression pattern. Its multiple-sized 
transcripts and proteins have been found in tumour samples and 
cell lines by a number of groups (citations omitted). In our 
previous studies, five alternatively sized transcripts of the 
human MDM2 were found in human ovarian tumour, bladder 
tumour and leukaemic cell samples (citation omitted) Here 
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we present data demonstrating two further MDM2 transcript 
forms with internal sequence deletions in human tumour tissue. 
We Hypothesised that these transcripts are generated by 
alternative splicing. To test this hypothesis and to explore the 
associated mechanisms, we have investigated the genomic 
structure and organization of the human MDM2 gene... 

Conclusion 

It is Appellant's position that claims are not obvious over Muzny in view of 
Vogelstein. It is further Appellant's position that the claims are in condition for allowance. 
The Examiner is invited to contact the undersigned at (914) 712-0093 if she has any 
questions. 

Respectfully submitted, 



Date: January 16, 2008 /Cheryl H Agris/ 



Cheryl H. Agris, Reg. No. 34,086 
P.O. Box 806 
Pelham, N.Y. 10803 
(914)712-0093 
Customer No. 2 
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CLAIMS APPENDIX 



7. An isolated nucleic acid molecule 20-51039 contiguous nucleotides in length consisting of a 
reverse or forward strand of a region of SEQ ID NO:4, wherein said region is selected from the 
group consisting of a 5'-non coding region depicted in nucleotides 51039-41739 of SEQ ID 
NO:4, a 3'-non-coding region depicted in nucleotides 9503-1 of SEQ ID NO:4, a contiguous 
intron-exon region between nucleotides 41738-9502 of SEQ ID NO:4, wherein a sequence 
segment comprising 41738-9502 of SEQ ID NO:4 encodes human mouse double minute 2 
homolog depicted in SEQ ID NO:2, a contiguous exon-intron region between nucleotide 41738- 
9502 of SEQ ID NO:4, wherein a sequence segment comprising 41738-9502 of SEQ ID NO:4 
encodes human mouse double minute 2 homolog depicted in SEQ ID NO:2, an intron depicted in 
nucleotides 36385-40645, 36309-33127, 32994-29616, 29564-25577, 25507-25384, 25287- 
21169, 21006-14110, 13953-13267, and/or 13188-10665, a region comprising a dinucleotide of 
the following group: 41739-41738, 40645-40646, 36309-36310, 36384-36385, 32994-32995, 
33126-33127, 29564-29565, 29615-29616, 25507-25508, 25287-25288, 25383-25384, 25576- 
25577,21006-21007,21168-21169, 14109-14110, 13953-13954, 13266-13267, 13188-13189, 
10664-10665 and/or 9504-9503; a transcription binding site selected from the group consisting of 

BINDING SITES huMDM2, location in SEQ ID NO:4 

AP1_C: 36-46,2876-2886; 

AP4_Q5: 7944-7980; 
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AP4_Q6: 7943-59, 8924-8940, 9294-9310; 



ARNT.01: 1682-1706,2193-2217,9201-9225; 



BRN2_01: 1040-1058,7803-7821; 



CAAT 01: 3292-3306; 



CDPCR3HDJM: 6522-6540; 



CEBPB_01: 1424-1438,3917-3931,4178-4192,4787-4801,6855-6869; 



CREL 01: 5630-5642; 



DELTAEF1 01: 83-95,6328-6340; 



FREAC7 01: 2757-2773,5154-5170,5823-5839; 



GATA1_04: 4846-4858, 7017-7029; 



GATA1 05: 8464-8476; 



GATA2 02: 6045-6057, 6073-6085, 6142-6154; 



GATA2 03: 2489-2501,3323-3335,3384-3396,7393-7405: 



GATA3 02: 



3264-3276, 6870-6882: 



GATA3 03: 



40-52, 5729-5741, 6529-6541, 6874-6886, 7041-7053, 7589-7601; 



GATA C: 7 



349-7361,8188-8200; 



HFH2 01: 



1743-1759,7995-8011; 



HFH3 01: 



502-518, 1739-1755, 4160-4176, 9402-9418, 9418-9434; 



HFH8 01: 



8184-8200; 



IK2 01: 



951-963,3588-3600; 



MZF1 01: 



1202-1210, 1447-1455, 4997-4005, 5424-5432; 



NF1_Q6: 



1480-1500,8166-8182; 



NFAT_Q6: 



4190-4208,6009-6027; 



NKX25_01: 741-755, 1648-1662, 1885-1899, 1984-1998, 3609-3623,4928-4942, 

5060-5074, 5889-5903, 8850-8864, 9190-9204; 



NKX25 02: 



2584-2599, 2970-2984, 4644-4658, 5179-5193, 6482-6496; 



NMYC 01: 



2560-2572; 



RORA1 01: 



220-238, 2638-2656; 



S8 01: 4644-4656,4842-4854,4845-4857,5200-5212,5371-5383,5735-5747, 
6482-6494, 6541-6553, 6544-6556, 6772-6784, 7270-7292, 7273-7285; 



SOX5_01: 
4789-4805; 



1355-1371, 1430-1446,3094-3110,3155-3171,4669-4685,4692-4708, 



SRY 02: 



4164-4180, 5665-5681; 



TATA_01: 
4206-4222; 



1261-1277, 2574-2590, 2723-2739, 2733-2749, 2770-2786, 4199-4215, 



TATA C: 



5900-5916, 7456-7472, 7702-7718, 7917-7933; and 



XFD2 01: 



7702-7218,7917-7933;. 



a transcription binding site selected from the group consisting of 



BINDING SITES huMDM2, location in SEQ ID NO:4 



AP1_C: 12109-12119, 12695-12705, 22600-22610, 24166-24176, 3131 1-31321, 35234- 

35244, 39184-39194; 
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AP1_Q2: 11952-11962, 12068-12078, 14798-14808, 21748-21758, 22613-22623, 23676- 

23686, 26562-26572, 30046-30056; 



AP1_Q4: 
39198; 



12695-12705, 31311-31321, 35234-35244, 36295-36305, 38784-38794, 39188- 



AP4_Q6: 



31635-31651; 



BRN2 01: 



13448-13466, 14764-14782, 28094-28112, 40027^0045; 



CAAT 01: 



11288-11302, 15054-15068; 



CDPCR3HD_01: 11286-11304, 13284-13302,20846-20864,29344-29362; 

CEBPB_01: 29241-29255; 

CREL_01: 36091-36103, 38873-38885; 

DELTAEF1_01: 18083-18095, 20385-20397, 26955-26967; 

FREAC7_01: 11982-11998, 15187-15202, 16523-16539, 16529-16545, 16587-16603, 16604- 

16620, 16676-16642, 16633-16649, 16644-16660, 16650-16666, 16657-16673, 16673-16689, 16762- 
16778, 21332-21348, 25689-25700, 26529-26545, 27767-27783, 29495-2951 1; 



GATA1 02: 10916-10928, 15775-15789, 18162-18174,26088-26100,32518-32530; 
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GATA1 03: 



28012-28024; 



GATA1_04: 11153-11165, 11630-11642, 13778-13790, 17439-17451, 19300-19312, 21606- 

21618, 22743-22755, 23747-23759, 25806-25818, 26529-26541, 29424-29436, 30455-30467, 32761- 
32778, 33352-33364, 33960-33972, 36101-361 13, 40007-40019; 

GATA1_05: 1 1590-1 1602, 26550-26562, 36737-36749; 

GATA1_06: 18772-18784, 23054-23066, 35568-35580, 37855-37867; 

GATA2_02: 20755-20767, 30830-30842, 34755-34767, 36285-36297, 39143-39155, 39641- 
39653, 40586-40598; 

GATA2_03: 13535-13547, 22711-22723, 23161-23173, 25028-25040, 27237-27249, 36277- 
36289; 

GATA3_02: 1 1558-1 1570, 16470-16482, 17225-17237, 19619- 19631, 22156-22168, 22443- 

22455, 24713-24725, 27619-27631, 32716-32728, 34124-34136, 34163-34175, 36832-36844, 38403- 
38415; 

GATA3_03: 10869-10881, 11515-11527, 13845-13857, 17221-17233, 18952-18964,20050- 
20062,40171-40183; 

GATA C: 15848-15860, 18899-18911, 23640-23652, 29072-29084, 30881-30893, 33198- 
33210, 37472-37484, 38621-38633; 
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GFI1 01: 



35469-35481,35492-35504; 



HFH2_01: 15939-15955, 24636-24652, 25866-25882, 32171-32187, 35372-35388, 39457- 
35473; 

HFH3_01: 13340-13356, 19218-19234, 21328-21344, 21336-21352, 21344-21360, 28062- 
28078, 32125-32141; 

HFH8_01: 14133-14149, 22578-22584; 

HNF3B_01: 13150-13166, 16505-16521, 25264-25280, 29443-29459, 37654-37670; 

IK2_01: 11547-11559, 17144-17156, 18961-18973, 23883-23895, 27617-27629, 28908- 
28920, 29241-29253, 30752-30764, 34768-34780; 

LYF1JM: 12319-12331, 19191-19203, 37226-37238, 39430-39442; 

MAX_01: 22974-22986,33339-33351; 

MZF1_01: 26105-26113,35187-35195; 

NF1_Q6: 12048-12064,33334-33354; 

NFAT_Q6: 13295-13313, 14157-14175, 14311-14329, 14414-14432, 18269-18287. 19326- 

19344, 20801-20819, 21177-21195, 22537-22555, 23861-23879, 25392-25410, 25879-25897, 27524- 

27542, 30636-30654, 30718-30736, 31525-31543, 33655-33673, 34726-34744, 34917-34535, 34990- 
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35008, 35979-35997, 36479-36493, 36577-36595, 37154-37172, 40224-40242, 40365-40383; 

NKX25_01: 12041-12055, 12340-12354, 12471-12485, 12742-12756, 12877-12891, 13849- 

13863, 18995-19009, 21440-21454, 21883-21897, 28426-28440, 30964-30978, 32033-32047, 32265- 
32279; 

NKX25_02: 10998-11012, 12711-12725, 14131-14145, 14726-14740, 16024-16038; 

NMYC 01: 18753-18765, 18754-18766, 23076-23088, 30534-30546, 34400-34412; 

RORA1_01: 13134-13152, 22966-22984, 24934-24952, 33341-33359, 34760-34778; 

S8_01: 11000-11012, 11977-11989, 12048-12060, 12051-12063, 13747-13759, 13923- 

13935, 13926-13938, 14676-14688, 14679-14691, 16026-16038, 16313-16325, 16316-16328, 17515- 
17527, 20756-20768, 20759-20771, 23154-23166, 23157-23169, 25198-25210, 25201-25213, 26651- 
26663, 27508-27520, 27511-27523, 29450-29462, 29478-28490, 29775-29787, 29778-29790, 29813- 
29825, 29816-29828, 31329-31341, 31677-31689, 31680-31692, 31732-31744, 31735-31747, 36137- 
36149, 36140-36152, 36812-36824, 36815-36827, 37413-37425, 38679-38691, 39474-39486, 39477- 
39489; 

SOX5_01: 27397-27413, 27572-27588, 28100-28116, 29230-29246, 29439-29455, 30690- 

30706, 31595-31611, 33871-33887, 341 13-34129, 34624-34640, 37668-37684, 38582-38598, 39124- 
39140, 40410-40426; 

SRY_02: 20016-20032, 22410-22426, 27329-27345, 29162-29178, 29499-29515, 30646- 

30662,31503-31519,35928-35944,37324-37340; 
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TAT AO 1 : 32722-32738, 32729-32745, 32807-32823, 33825-33841, 34120-34136, 35433- 
35449, 36593-36609; 

TATA_C: 11015-11031, 11817-11833, 13635-13651, 14930-14946; 

TCF11_01: 18543-18549,22574-22580,31281-31297,31489-31505,38754-38770; 

USF_01 : 23075-23087, 32577-32589; 

VMYB_02: 11526-11538, 17384-17396, 18400-18412, 19549-19561, 22188-22200, 40486- 
40508 and 

XFD2_01: 16620-16636. 18153-18169,22102-22118,23141-23157. 
And a transcription binding site selected from the group consisting of 
BINDING SITES 

huMDM2, 1 location in SEQ ID NO:4 

AP1_C: 44584-44594,49069-49079: 

AP1_Q2: 42174-42184, 45217-45227, 48422-48422, 50447-50457; 
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42702-42712,50806-50816; 

42117-42133, 421 18-42134, 42244-42260, 45432-45448; 45433-45449, 46609- 
46625; 

42310-42328, 44022-44040, 47514-47532, 48900-48918, 48967-48985; 
44866-44880; 

45671-45689, 49219-49237; 
42437-42449, 49797-49809; 
47026-47042, 47292-47308, 47658-47674; 
43482-43494, 48926-48938, 49284-49296; 
47371-47383; 

43054-43066, 43162-43162, 43967-43979, 45464-45476, 45916-45928, 47763- 
47775; 

49319-49331,49459-49471; 
47590-47602; 
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GATA2_02: 42660-42672,43475-43487; 



GATA2_03: 43714-43726, 50948-50960; 



GATA3 02: 49155-49167, 49844-49856; 



GATA3 03: 



42202-42214, 44810-44822, 48438-48450, 49136-49148, 49337-49349, 49869- 
49881; 



GATA C: 



44011-44023, 45256-45268, 45823-45835, 47915-47927, 49201-49213, 49573- 
49585; 



GFI1 01: 



46606-46618, 47063-47075; 



HFH3 01: 



47030-47046, 47284-47300, 47288-47304; 



IK2 01: 



45275-45287; 



LYF1 01: 44564-44576, 46991-47003, 49567-49579; 



MAX 01: 43234-43246,48726-48738; 



MZF1 01: 



41772-41780, 42290-42298, 42295-42303, 44507-44515, 45105-45113, 45203- 
4521 1, 49948-49956, 50774-50782; 



NF1_Q6: 50209-50229; 
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NFAT_Q6: 



42061-42079, 44418-44436, 46399-46417, 47974-47992, 49267-49285, 49964- 
49982, 50392-50410; 



NKX25_01: 42394-42408, 43507-43521, 461 15-46129; 

RORA 1 0 1 : 45073-4509 1 , 487 18-48736; 

S8_01: 43552-43564, 45214-45226, 47160-47172, 48419-48431, 49295-49307, 50379- 

50391; 

SOX5_01: 43716-43732,46351-46367,47156-47172,47774-47790,47868-47884, 
47974-47990, 48915-48931, 50323-50339; 

TATA_01: 45588-45604, 47625-47641, 48026-48042, 48659-48675, 49056-49072, 49079- 

49095, 49152-49168; 

TCF11_01: 49115-49131; 

VMYB_02: 42010-42022,42279-42291,44651-44663; and 

XFD2_01: 42870-42886, 42910-42926. 



10. A composition comprising the nucleic acid molecule of claim 7 and a carrier. 
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15. A kit comprising the nucleic acid molecule of claim 7. 

16. The kit according to claim 15, in which the nucleic acid molecule is labeled with a 
detectable substance. 

17. A solid support comprising the nucleic acid molecule of claim 7. 

18. The solid support of claim 17 wherein said support is a microarray. 

20. The solid support of claim 18, which further comprises a nucleic acid molecule encoding 
human mouse double minute 2 homolog, complementary sequence thereof or a portion of 
said nucleic acid molecule containing at least 20 contiguous nucleotides. 

22. A method of identifying variants of SEQ ID NO:4, or its complementary sequence, 
comprising 

isolating genomic DNA from a subject and determining the presence or absence of a variant 
in said genomic DNA using the nucleic acid molecule of claim 7. 

23. A method for detecting the presence or absence of SEQ ID NO:4 or its complementary 
sequence in a sample, said method comprising (a) contacting the sample with the nucleic acid 
molecule of claim 7 and (b) determining whether the nucleic acid molecule binds to said 
nucleic acid sequence in the sample. 
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24. An isolated nucleic acid molecule 20-5000 contiguous nucleotides in length consisting of 
a reverse or forward strand of a contiguous exon-intron region between nucleotides 41738-9502 
of SEQ ID NO:4, or contiguous intron-exon region between nucleotides 41738-9502 of SEQ ID 
NO:4, wherein a sequence segment comprising 41738-9502 of SEQ ID NO:4 encodes human 
mouse double minute 2 homolog depicted in SEQ ID NO:2. 

25. The isolated nucleic acid molecule of claim 24, wherein said nucleic acid molecule is 20- 
5000 contiguous nucleotides in length and comprises nucleotides 41739-41738, 40645-40646, 
36309-36310, 36384-36385, 32994-32995, 33126-33127, 29564-29565, 29615-29616, 25507- 
25508, 25287-25288, 25383-25384, 25576-25577, 21006-21007, 21168-21169, 13953-13954, 
14109-14110, 13188-13189, 13266-13267, 10664-10665 and/or 9504-9503 of SEQ ID NO:4or 
their reverse strands. 

30. A microarray comprising a plurality of the nucleic acid molecules of claim 7. 

31. The microarray of claim 30 wherein said microarray further comprises a nucleic acid 
molecule encoding human mouse double minute 2 homolog, complementary sequence thereof or 
a portion of said nucleic acid molecule containing at least 20 contiguous nucleotides. 

32. A method for detecting the presence of a nucleic acid sequence of SEQ ID NO:4 or its 
complementary sequence in a sample, said method comprising contacting the sample with the 
nucleic acid molecule of claim 7 and determining whether the nucleic acid molecule binds to 
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said nucleic acid sequence in the sample. 
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Human Chromosome (Chr) I2ql3 is a region of clinical interest in 
that a variety of disease phenotypes have been localized to the 
area. A number of relatively detailed genetic maps of the region 
around human Chr 1 2q 1 3 are now available. A second-generation 
YAC contig map of human Chr 12, which extensively covers this 
area, has also been produced. This has resolved a number of dis- 
crepancies in the previous genetic maps. The current YAC contig 
map contains only five small gaps in the approximately 45 cM 
region between the polymorphic microsatellite markers D12S333 
and D12S106. A number of expressed sequences have also been 
localized with varying degrees of precision to the region around 

12<1 In an attempt to generate YAC resources with low levels of 
chimerism and to facilitate disease and EST mapping in this ge- 
nomic area, we now describe the isolation of 130 YACs from the 
ICI library (Anand ct al. 1990). We have used 36 generic markers 
and It ESTs from five previous Chr 12 maps (Guyer and Cann, 
1992; Weissenbach et al. 1992; Schoenmakers et al. 1994; Kucher- 
lapati' et al. 1994; Gyapay et al. 1994). Our data can be integrated 
with the most recent second-generation YAC contig map of Chr 12 
(Krauter ct al. 1995). We have mapped the WNT-l gene (Nusse et 
al. 1991) at 62-66 cM (based on the 200 cM overall Chr 12 map 
of Krauter et al. 1995) in close proximity to D12S339 (in the contig 
gap between DI2S85 and D12S361) and the GPD1 gene. This is an 
area under-represented in the latest Chr 12 contig map. We have 
also observed close linkage between GADD153 and GL1 at 77 cM 
in the interval between D12S3J2 and DI2S90 (Krauter et al. 1995). 
The markers D12S17 and DJ2S96 also co-localize at 71 cM in the 
area between D12S390 and DI2S398, which has enabled us to 
position several first-generation genetic markers with respect to 
these points. We have isolated YACs for LRP/A2MR, which is 
known to be physically linked to GL1 (Forus and Myklebost 1992). 
These may be of value in closing a gap at 76 cM in the current map 
immediately centromeric of GADDI53/GL1. The YAC 11GH7 
containing DJ2S3J2 may also be useful in this respect as it is the 
closest known marker proximal to the same contig gap. Similarly, 
YACs containing MDM2 and DJ2S43 may help to close the gap 
between RAP1B and D12S80 at 85 cM. As well as YACs con- 
taining COL2AI and LALBA, which are included in the latest 
contig map, we have isoalted YACs for PMCA, CD63, and PAB1, 
• which are as yet not accurately positioned on the map. D12S59 or 
IGF-1 containing YACs described herein are now known to map 
distal to our region of interest at 1)3 cM and 1 18 cM respectively 
(Krauter et al. 1995). Vectorette YAC end sequencing (Riley et al. 
1990) has enabled us to generate other novel STSs and correspond- 
ing YACs from the region. 

Novel YACs identified with 36 genetic markers from the 
I2ql3 region and from 1 1 expressed sequences in the region are 
presented in Table 1. Where known, a corresponding YAC from 
the second-generation YAC contig map of human Chr 12 is in- 
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eluded in Table 1 for ease of cross-reference (Krauter et al. 1995). 
The 47 loci screened all produced at least one positive YAC and, 
in most cases, multiple YACs, consistent with a x3.5-fold genomic 
representation of this library (Anand el al. 1990). In a number of 
cases, the same YAC was identified with more than one genetic 
marker, implying that these markers lie in close physical proximity 
(the average insert size of the YACs in this library is 350 kb). 
D12S339 and the WNT-l oncogene both gave positive signals 
with YACs 26BB4, 28AD4. and 32BC2 in the region of the 64 cM 
contig gap (Fig. 1). D12S368 and D12S174 at around 68 cM both 
identified YAC 30AD1 1. DJ2SJ7 and D12S96 both identified the 
YACs 20BB4 and 20BF12. This places D12SI7 at the 71 cM locus 
in the middle of the KRT gene family and centromeric of 
D12S398. D12S359, DI2SJ9, DJ2S325. HOXC5, and D12S103 
respectively. GADD153 and GLIl at 76 cM identify the YAC 
26EG10. confirming the close physical linkage of these two genes. 
This locus is immediately telomeric of a gap at 76 cM in the 
current second-generation YAC contig map. and end sequencing 
of 26EG10 may assist in contig closure. The markers D12S305 and 
D12S104 at 77 cM both identify the YAC 10ED1 concordant with 
the second-generation map. End sequencing of 10ED1 and re- 
screening of the YAC library identified 15BE5. This YAC also 
contains D12S90, confirming marker order DJ2S90, DJ2S305. 
DI2SI04. DJ2S83 and D12S334 both identify the YAC 34CA7. 
confirming their physical proximity at 77 cM also (Krauter et al. 
1995). 

YACs for which sequencing of human DN A insert termini has 
been performed are shown in Table t. Following confirmation of 
localization to Chr 12 with the Coriell Mapping Panel, additional 
YACs were isolated as shown (Table 1). These increase the YAC 
coverage at D12S96/17 (71 cM. 20BF12), DJ2S90/305/W4 (77 
c.M, 10ED1, 15BE5). and D12S43 around the 86 cM region 
(34HH10). A number of YACs proved to be chimeric during 
analysis; the right-hand end of 40AA12 (a YAC identified using 
primers from the MDM2 gene) maps to human Chr 1. The MDM2 
YAC 6FD11 is also highly chimeric by FISH analysis (data not 
shown). The MDM2 YAC 40CB5 was not chimeric by FISH 
analysis and was used to confirm its localization to the 12ql4 
region (Fig. 2; Heighway et al. 1994). The other YAC termini 
sequences in Table 1 all map to Chr 12 by PCR analysis of somatic 
cell hybrids. 

In the last four years, six separate Chr 12 maps have been 
presented which include markers in and around the I2ql3 region. 
The NIH/CEPH collaborative mapping group comprehensive ge- 
netic linkage map of the human genome (Guyer and Cann 1992), 
in conjunction with data of Schoenmakers and associates (1994). 
provided an approximate marker order cen-COL2Al-ELAI- 
D12S29-D12S15-DJ2S25~D12S14-DJ2S4-D12SJ8-D12SI6- 
DJ2S17-D12S6-D12S28-DJ2S22-D12S28-D12S19-DJ2S43- 
D12S8-D12S64-D12S7-itl with the latter probably located dis- 
tally in I2ql5. In complementary studies in 1992, Weissenbach 
and colleagues described another set of markers across the same 
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Tabic 1, YACs identified will) known genetic markers and ESTs. YACs in bold are 
"delected by more than one marker. CEPH YACs are included for ease of cross 
Preference. YACs identified by rcscrecning the library with Vcctorctte-generated end 
sequenes are also included (Riley et a!. 1990) as are the novel STSs generated in this 
process. The NIGMS monochromosomal human/rodent somatic cell-hybrid mapping 
panels #1 and #2 (NIGMS, Coriell Repository, Camden, NJ., USA) were screened by 
PCR amplification with newly generated oligonucleotides from YAC insert termini as 
primers. PCR conditions were as described for the YAC library screening. The panel 
was screened to confirm localization to human Chrl2 and thus to ensure, to a first 
approximation, that die identified YACs were not chimeric. These sequences have 
been assigned Genbank accession numbers U51 106-U5-U 1 1. 



Markers Tm (*Q (mx) Identified I CI YACs CEPH YACs 



DI2SI7 


58.5 


1.5 


20BB4; 20BFI2 


— 


012S43 


55.0 


1.8 


9GC9; 34HH10 . 


— 


D12S58 


53.0 


2.0 


271D4: 271D8; 29GE6 


— 


D12S59 


53.0 


1.3 


18H2; 13GD4 


707f6 


D12S80 


53.0 


2.0 


5FF10; 1SEG1; 19DH6; 32HEI2; 


749a7 








34DD7; 37FC5; 3SGGS 




D/2S&3 


50.0 


1.5 


34CA7 


799all 


DI2SS5 


53.5 


2.0 


5IA12; 12HC4; 12IE9; I8CH4; 


690hl: : 








21GB5; 37EH10: 39DBI 




D12S87 


52.0 


2D 


1HG9; I0IG5: 101G6; 23BA10; 


759b2 








37AE2: 37FA3 




D12S90 


53.0 


1.5 


4DD8; 9EGS; 15BE5; 231DI; 


770a7 








34HA8: 361C2 




DI2S96 


53.0 


2.4 


2QBB4; 20BF12 


934b7 


D12S103 


530 


1.5 


7DFI 


790f7 


DI2SI04 


52.0 


1.0 


10ED1; 30AA4; 39EH4 


907f2 


DJ2S106 


53.0 


1.5 


33FH8; 35HF4; 39IH2 


763a8 


DJ2S1I3 


49.0 


1.5 


5HB11: 18DA9 


— 


DI7SJ31 


54.5 


1.5 


pool 8A 


— 


D12SJ37 


55.0 


1.5 


40AF6 


396cl0 


Dl 25174 


59.5 


1.5 


30AD11 


7l7g8 


D12S305 


51.0 


1.5 


10ED1; 20HH9; 38BC9 


907f2 


D12S3IZ 


55.0 


1.5 


IIGH7 


790f7 


D12S3I3 


51.0 


1.5 


21AA8 


751a4 


DI2S32S 


520 


1.5 


8BB12; 8EC4; 24BF5: 39BB1 


928ai2 


DI2S326 


54.5 


1.5 


30 FES 


806c 12 


DI2S329 


56.S 


1.5 


11GCII: 16GG9;24FH7 


970d8 


D12S33I 


51.5 


1.5 


9GD6; 11FD5; 40GB 12 


690h5 


012S333 


53.0 


1.5 


6HG8; 12GG4; I2GH4; I9DD7: 


952a6 








34DD8 




D12S334 


49.0 


1.5 


34CA7; 351D6 


799a 11 


D12S335 


52.5 


1.5 


pool 15F; 34G: 35G; 39F 


755d7 


DJ2S337 


55.0 


1.5 


18AG10; 20EB9; 22EC12; 


925hl2 








27BH9 




D12S339 


57.5 


1.5 


5FA12; 9CG7: 26BB4; 28AD4; 










32BC2 




012S345 


49.0 


1.5 


19FF10; 29HD4:29HD1I; 


951h6 








33HHI2; 37IH1 




DI2S347 


53.0 


1.5 


11BD7 


95la5 


DI2SJS0 


57.0 


1.5 


16BB1;24CEII;27EE5; 29CE3; 


814fl2 






35CCU;39HB2:40CC4 




D12S355 


52.5 


1,5 


39CB8 


959a5 


D12S36I 


53.0 


1.5 


26BG9: 40CD1I 


95ia5 


D12S36S 


53.0 


1.5 


30AD11 


7l7g8 


DI2S37I 


55.5 


1.5 


9CBI1; 101H6; 23CCI0: 37HC10 


926h3 



Genes 


Tm (°C) 


MgO x (mM) 




LALBA 


53.5 


1.5 


38FA5 


COL2A1 


57,0 


3.0 


IIA5; 258F10 


GADD153 


58.0 


1.5 


26EC10 


GLI 


53.5 


1.5 


26EG1: 25EC10 


WNT1 


55.0 


1.5 


26BB4; 28 AIM; 32BC2 


MDM2 


56.5 


1.5 


6FD11; 40AA12; 40CB5; 26DC3 


LRP/A2MR 


56.0 


1.5 


13FE9; 15DD6 


PAB 


52.5 


1.5 


11GG9 


CD63 


58.0 


1.5 


26AB8; 27GEI2; 33AG6 


PMCA 


65.0 


3.0 


4AF4 


IGF1- 


61.0 


2.5 


17DF5; 30EC4; 39CF12 


Original YAC 


Left side YACs 


Right YACs 


10EO1 


30AA4: 39ED3 


15BE5; 291E10 


20BFI2 


18EF6: 19IC6; 32DA10; 36EF3; 37DD1; 37IF5 N/D 


34HHI0 


14BCI1 




N/D 



Table 1. CominuetL 



YAC clone Right end sequences 

J0EO1 ~ 5'-7GC ICC TIC CCC TTT ACA CTT TCG CTC.TTA CAT CAT CGG CAC TAT TTA GCG 
AAC CTA TTA TGC TTC GAG GGG CCT HA TTG CA6 AAT CAA CaC AAT CAC CAC 

Chrom acc tag aaa or cag aaa aca cct cgc tat rrc gca ata ata tat tta caa 

12 end gtcaaaagc TArCTT ATAACTAAG aagatt acctgt T-3 1 

40AA 12 S* -CCA ATT CAC TTA CTA AAA AAT ACA CAT TTA ATA TAC TAC ATA ATC AAG TTG 

AGG ATC TAC ATA TTC ACC ATA ATA ATT AAA ATT CTC CAA ACA GTG CCC ACC 
Chron, TTT CAA ATC AAC TTA CAT ATC ACT ATC TTT TAA ACT AAG AAT CCA TCA TIG 

lend cat tgc act aga tcg acc gag acc cag cca t • 3' 



YAC clone Uft end sequences 



I0ED I S'-HC TAT ATC att gag taa gat aaa gag cca aga ctc TGT ATT GAT AAA CAA 

TTG ATC TTT GrA TTT TCC TGA CCT TAC AAA AAT ATT CCA CAT TTT C TC CTA AAA 
Chrom CCT TTA AAC T n TAG AAA TTA ATT AAA TCT TCC ATC CAC TTC AAA TTT GTG TTC 

1 2 end tgt cta taa tat gaa cta aac att cag ah cat tgc ctc ata tgt tat tca-3' 

20BF12 S'-CAA TTC TTC CAG GAT AAA TGGCtGCGA TAG GTG CCA CGA GAA ACC AAT CCT 

CAC CCT ATT CCC GCA TGC ACT TCA TCA GAG CAA TTA TAG ACT GGT CTG TAT GTA 

Chrom ctc tca cag aag ax ttt gct cag aag tca tct tag tta ha An aca cca ata 

12 end aaa ttc ttg atg GTC aaa tag gt« 3' 

34HH10 5' AAC CAT TTC CAT CAC XA CAG ACT CCC CAA GCA AIA GAG TAG AAC TAG GAG 

ATG ACT CCT GCC ATG GAA GCA AAA CTC AA- 3* 

Chrom 
12 end 



40AA 12 5*-GTT CCT TTA AGG CCC AAC ACT TTA An TAT CAC TAC GGA ATT CAt TTT ATA 

AT.T GAA ATC TCA TCC ACA AAT TTT AAC ACT ATA TO AGC AAA TGA TAA ACA 
Chrom TAT TTT CCA CC. 3* 

12 end 



region from DI2S87 to D12S106. a region at that stage thought to 
be of around 35 cM. The reported order of markers was cen- 
D12S87-D 1 2S85-D 1 2S96-D 1 2S 1 03-D 1 2S90-D 12S104- 
D12S83-Di2S102-Dl2S80-D12S92-D12Sl06-tel. It was not 
clear at that time how these two different sets of markers over- 
lapped. This was established to some extent in the report of the 2nd 
International Workshop on Human Chromosome 12 Mapping 
(Kucherlapati et al. 1994). 

The 1993/94 Gencthon human genetic linkage map (Gyapay et 
al. 1994) described 28 genetic markers over an expanded 45 cM 
region flanked by D12S87 and DJ2S106. This map was integrated 
with that of Weissenbach and coworkers (1992). Most recently, a 
second-generation YAC conug map of human Chr 12 has been 
produced (Krauter et al. 1995). This represents an almost complete 
physical map of the region 12cen-12ql5 and resolves many of the 
conflicts concerning accurate ordering of genetic markers. Indeed, 
the contig map places D12S87, previously viewed as a I2qll 
marker, on the short arm with DJ2S333 even more distal on I2pl 1. 
The marker D12S331 is now posiuoned at 55 cM just below the 
centromere on the q arm. However, this latest map still contains 
five gaps in the region we have been investigating and does not 
include many first-generation genetic markers, some of which can 
now be placed on the map. 

While the second-generation YAC contig map of the I2ql3 
region can now be considered as definitive, comparison of the data 
therein with previous genetic maps and the results described in this 
paper allows us to draw some useful additional conclusions. Work- 
ing from centromere to qter, there is an EST (ATP5B) mapping in 
the 12ql 1-12 region (Kucherlapati et al. 1994) that is not included 
in the YAC contig map. This presumably lies between the centro- 
mere (54 cM) and the LALBflJD12SJ20fCOUA\fVDWD12S85 
locus, which is now well resolved on the physical map but was not 
clearly discriminated on the genetic map. It is therefore possible 
that ATP5B maps in the first gap in the current YAC contig at 
around 59 cM. The marker D12S339, which was included between 
D12S85 and D12S361 (around the 64 cM mark) in the 1993/94 
Genethon map, is not included in the 1995 YAC contig map. We 
have now shown that DJ2S339 is closely linked physically to 
WNT1. It seems probable that these two markers map in the sec- 
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. ond contig gap around 64 cM on proximal 12q. The 1994 Chro- 
, mosomc 12 Workshop map links WNT1 with DJ2S13J, D12S6, 
D12S17, and D12S71 (Kuchcrlapati ct al. 1994). Our data contra- 
dict these results in that we have demonstrated tight physical link- 
age between DI2S17 and D12S96, the latter now mapping more 
distaily at 71 cM in the middle of the keratin gene cluster. Nev- 
ertheless, it would be worth while to screen YAC libraries with 
DI2S6. DI2S7) (D12S131 is reported herein), because it is con- 
ceivable that these may also lie in the 64 cM 12q YAC contig gap. 
The data of Schoenmakers and associates arc also valuable in this 
regard. They localized a series of first-generation Chr 12 markers 
to the region between COL2A1 and D12S17. Their approximate 
locus order was cen-ELA 1 -DI2S29-DJ 2SI 5-D 1 2S25-D 12S14- 
D12S4-D12SI8-D12SJ6-D12SJ7-lzL They also mapped D12S6 
distal to D12S17. which by our data would also place it distal to 
D12S96. Thus, the two genetic maps are somewhat at variance. 
However, ELA1, D12S29, DJ2S15, D12S25, D12S14, D12S4. 
D12S18, and D12S16 would be potential markers to use to identify 
YACs in an attempt to fill the second YAC contig gap at 64 cM. 

Our physical linkage of DJ2S17 and D12S96, plus the 1995 
localization of DI2S19 at 71 cM (Krauter ct al. 1995), would 
suggest (by comparison of the 1992 N1H/CEPH map with the data 
of Schoenmakers et al.), that D12S6, DJ2S28. arid D12S22 map 
between D12S96 and D12SJ9. This can now be easily confirmed 
because the YAC contig is complete between these points. Again, 
the 1992 N1H/CEPH map and Schoenmakers and colleagues 
(1994) have previously placed the markers DI2S43, D12S8. 
DJ2S64. and D12S7 distal to DJ2SJ9. The 1994 workshop report 
(Kucherlapati et al. 1994) linked D12S8, DI2S43. and LYZ with 
DJ2S80, which is now positioned on the YAC contig at 87 cM. 
This is immediately telomeric of a fourth gap in the YAC contig at 
85 cM. DJ2S8, DI2S43. and LYZ are useful markers to screen 
YAC libraries in an attempt to bridge the 85 cM contig gap be- 
tween RAP IB and D12S80. Furthermore, the 1994 workshop re- 
port (Kucherlapati et al. 1994) links the IFNv and RAPIB loci 
with D12S56. Again, D12S56 may be useful in filling the gap in 
this region. Moreover, Bureau and coworkers (1995) have mapped 
the MDM2 gene in close linkage to IFN^. but the MDM2 gene is 
not included on the 1995 contig. It is. therefore, also possible that 
MDM2 lies in the same contig gap. In support of this, the MDM2 
YACs differ from our D12S3J3 YAC immediately proximal to 
IFN-y. This suggests that MDM2 may be distal to IFN7. While two 
of the YACs described herein for MDM2 are highly chimeric 
(6FD11 and 40AA12), the YACs 40CB5 (Fig. 2) and 26DC3 
(Heighway et al. 1994) have been localized by FISH analysis to 
12ql4.3-ql5 and may be useful in bridging the contig gap. This 
cytogenetic localization is in agreement with the reported local- 
ization of IFN-y (Bureau et al. 1995) and with the mapping of the 
slightly distal markers DI2S213 and D12SU5 to 12ql4-ql5 by 
Fejzo and associates (1995). It is interesting that the previously 
noied ^amplification of the GLI, CDK4 and MDM2 genes in 
human sarcomas (Khatib et al. 1993) is entirely consistent with the 
genetic map of this region. The YACs for DI2S350 and D12S326 
may prove useful in bridging a fifth gap, which they flank at 91 cM 
in the current contig map. 

D12S64 may be some way distal to the DJ2S8/D12S43/ 
D12S8Q locus at 12ql4-ql5, and in fact DI2S7 maps at 103 cM, 
considerably distal to the D12SJ06 marker at 98 cM which we 
have taken as the boundary of our area of investigation. Similarly, 
D12S58, for which three YACs are described herein, now maps at 
113 cM, beyond the 12ql5 region. At the centromeric end of the 
region, DI2S59 and DJ2S345, both map adjacent to the centro- 
mere on the short arm of Chr 12 (Kucherlapati et al. 1994; Krauter 
et al. 1995). Our DJ2S87 YACs also map on the p arm. as do our 
most distal YACs for D12S333(\2p\\). 

Of the ESTs that have been used in this study to generate YAC 
clones, LALBA, COL2A1, G ADD 153, and GLI are already posi- 
tioned on the 1995 YAC contig map at around 60, 61, 77, and 77 



cM respectively. The IGF1 gene, for which we have identified a 
number of YACs. maps distal to D12S106 at around 118 cM. 
These YACs are therefore included in Table 1 only for archival 
purposes. Of the ESTs for which we have isolated YACs that are 
not yet positioned on the 1995 YAC contig map, the WNTl clones 
promise to be of value in that they map with D12S339 in a current 
contig gap at 64 cM. Similarly, our MDM2 YAC clones may map 
in the current YAC contig gap at 85 cM. At present, the precise 
positions of PMC A. CD63, and PAB1 on the physical map remain 
to be confirmed. Finally, we have isolated YACs for the LRP/ 
A2MR gene (Paulien et al. 1992), which has been physically 
mapped 300 kb from GLI (Forus and Myklebost 1992). These may 
be useful in closing a third YAC contig gap in this region between 
DI2S312 and GADD153 at 76 cM, as indeed may the DI2S3J2 
and GADD153/GL1 YACs described herein. The availability of a 
wide range of YAC reagents from the 12cen-12ql5 region will 
without question facilitate gene cloning exercises in this area of the 
genome and should allow us to resolve some of the outstanding 
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Fig. 1. Schematic 12pi l-12q!5 genetic map highlighting gaps in the cur- 
rent YAC contig and the positioning of additional markers for some of 
which new YACs are described herein. The ICI YAC library (Anand el al. 
1990) was screened by the polymerase chain reaction (PCR) with available 
genetic markers from the region. Standard PCRs contained 100 ng pooled 
YAC DNA solution, 30 pmol of each primer. 1.0 x Taq buffer (Promega), 
0.2 mM dNTPs (Promega), luTaq DNA polymerase (Promega). and 1.5 
mMgCl 2 except in specific cases shown in Table 1 where higher MgCI 2 
concentrations were necessary. All PCRs were carried out with a Techne 
PHC-3 thermal cycler. The denaturing step was 5 min^S'C for purified 
DNA; 10 min/95°C for whole yeast cells; followed by 38 cycles of 30 
s/95*C, 30 s/primer Tm and 1 mirV72°C with a final elongation step of 10 
min/72 0 C 
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Fig. 2. Fluorescent in situ hybridization analysis 
of the MDM2 YAC 40CB5 confirming 
localization of this gene to 12ql4-l5- 



questions concerning the frequent translocations encountered in 
the region in a variety of malignancies. 
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Interferon-y is a cytokine with multiple effects- It 
interferes with the replication of several viruses and 
plays a key role in the regulation of immune responses. 
Therefore, the gene coding for interferon-y could be 
implicated in the susceptibility of humans to several 
diseases. We have localized this gene close to the 
D12S335 and D12S3 13 microsatellites on both the phys- 
ical and the genetic maps of the human genome. We 
also physically mapped this gene close to the MDM2 
locus on chromosome band 12ql5. Finally, we describe 
the organization of the Ifg, Myf-6, Mdml, and Mdm2 
loci on mouse chromosome 10, in a region syntenic to 
human chromosome band 12ql 5. o Academic p™», inc. 

Interferon-y (1FNG) was discovered because of its 
antiviral property (12), although it is now studied 
chiefly because of its central role in the regulation of 
immune responses. This cytokine is secreted by CD4* 
T cells committed to the Thl pathway, by CD8* T cells, 
and by activated macrophages. Because of these multi- 
ple functions, the gene coding for IFNG is a good candi- 
date gene for control of susceptibility to various infec- 
tious as well as immune-mediated diseases of humans 
and other mammals. The persistent infection of the 
mouse central nervous system by Theiler's virus is a 
case in point. The Ifg gene is a good candidate for con- 
trol of the persistence of the infection for two reasons: 
(i) Persistence is controlled by a gene that was mapped 
to the telomeric region of chromosome 10, close to the 
Ifg locus (5). (u) Resistant 129Sv mice whose gene cod- 
ing for the 1FNG receptor has been inactivated become 
susceptible (8). 

A precise localization of the human FFNG gene was 
not available until the present work. This gene had 
been localized to band 12q24 using FISH and by screen- 
ing a panel of somatic hybrid cell lines (14, 19). Re- 
cently, Ruiz-Linares (17) described a microsatellite in 
the first intron of this gene, and Awata et al (3) re- 
ported a difference in the allelic distribution of this 
marker between a group of patients with insulin-de- 

1 To whom correspondence should be addressed. Pax; 33 <1) 40 61 
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pendent diabetes mellitus and a control group. For fu- 
ture studies using the candidate gene approach, it 
would be extremely useful to locate precisely the IFNG 
gene in particular with respect to polymorphic microsa- 
tellites markers. 

With this goal in mind, we screened the YAC library 
from CEPH 0 ) by PCR using two primers (forward 
5 '-GCTGTTATAATTATAGCTTGT-3 ' and reverse 5'- 
AGGGTATTATTATACGAGCT-3 ') derived from those 
described by Ruiz-Linares (17) for the IFNG microsat- 
ellite. PCR was performed using a kit from Amereham. 
After denaturation at 94°C for 2 min, 200 ng of DNA 
in 25 a! was submitted to 40 cycles of amplification 
(94°C, 40 s; 50°C, 40 s; 72°C, 15 s). Using this procedure, 
we isolated clone 825G7 of the YAC library. We found 
that clone 825G7 belonged to a contig of 25 other YAC 
clones, according to inter-Alu PCR patterns described 
in the CEPH-Gen6thon WWW server (6). All clones 
in this contig contained either both microsatellites 
D12S335 and D12S313 or one of the two. Each clone 
was tested for the presence of the IFNG microsatellite. 
Eleven clones contained the IFNG microsatellite. Ten 
also contained both the D12S335 and D12S313 micro- 
satellites (YAC clones 745A10, 743E2, 751A4, 809H4, 
823D1, 870H3, 924E4, 926A6, and 983H8), and one, 
YAC clone 763F1, contained only the D12S313 micro- 
satellite. Therefore, these results demonstrate that the 
IFNG gene is physically linked to the D12S335 and 
D12S313 microsatellites. 

To confirm the position of the IFNG locus on the genetic 
map, we analyzed the segregation of alleles of five micro- 
satellite markers in the eight families that have been 
used to construct the Genethon map (10). Besides mark- 
ers D12S335 and D12S313, we used two more microsatel- 
lites located on either side of the D12S335-D12S313 
region. One was taken from a group of six congrega- 
ting markers (D12S104, D12S305, D12S355, D12S334, 
D12S83, and D12S329) centromeric to the D12S335 
marker and the other from a group of three cosegregating 
markers (D12S344, D12S80. and D12S92) telomeric to 
the D12S313 marker. For each family, a pair of markers 
was chosen according to its polymorphism within the 
family. The parental meiosis were not iruormative for 
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PIG. 1. Segregation of the alleles of IFNG and four microsstellite markers in two families. Maternal alleles a ^°^^ a ^^ 
Mra of Xmicrosatellites wen* obtained from the G^nethon database. Dashes correspond to cases for which fg™^™^ 
fvS stJp^d areas indicate a recombination event, and bold face identines alleles from each parent The alleles of IFNG are numbered 
according to the size of the amplified DNA, from tho smaller to the larger fragment 



two families. The pedigrees of the other six were analyzed 
for the markers described above: IFNG, D12S335, 
D12S313, and the two flanking microsatellites. The re- 
sults are presented in Fig. 1 for two families and summa- 
rized in Table 1. No recombination was observed between 
the D12S313 and the IFNG markers among 137 informa- 
tive meioses. Only one recombination was observed be- 
tween the D12S335 and the IFNG markers among 135 



informative meioses. Analysis revealed that the flanking 
markers were indeed less linked to the IFNG locus than 
the D12S335 and D12S313 markers. . 

We next showed, by physical and genetic mapping, 
that the IFNG locus is linked to the D12S313 and 
D12S335 loci. Indeed, YAC clone 926A6 was recognized 
by these three markers. This YAC clone ia chimeric 
and contains regions of both chromosome 12 (band 
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TABLE 1 
Genetic Localization of IFNG 





No. crossing over 
with IFNG 


No. of informative 
meioses 


D12S329/D12S83 


5 (3.9 cM) 


129 


D12S335 


1 (0.8 cM) 


135 


D12S313 


0 (0.0 cM) 


137 


D12S344/D12S92 


6 (4.4 cM) 


137 



12ql5) and chromosome 5, as shown by FISH (6, 9). 
Therefore, our results localize the IFNG locus to band 
12ql5, which is different from the previously published 
localization (band 12q24) (19). The region of human 
chromosome 12 to which we localize the IFNG gene is 
syntenic to a region of the mouse genome where we 
and others have previously localized the mouse Ifg gene 
(2, 5, 18). In our work, the positions of loci Mdml, Ifg, 
and MyfS and of three microsatellites DlGMitlO, 
DlOMitl4, and D10Mitl64 from the Whitehead Insti- 
tute were determined using the progeny of a Fl (B10.S 
x SJL/J) x B10.S backcross (Fig. 2) (5). This backcross 
was typed for more than 100 loci that were distributed 
among all of the autosomal chromosomes. The results 
showed that the Mdml, Ifg, and MyfS loci were linked 
to each other in the telomeric region of chromosome 
10. Other authors have also reported that the Mdml, 
Mdm2 f Mdm3, and Ifg loci are closely linked (2, 18). 
Interestingly, the MDM2 locus has been localized by 
FISH to human chromosome 12ql4,3-ql5 (11). Thus, 
the analysis of the syntenic region of the mouse genome 
agrees with our localization of the IFNG locus to hu- 
man chromosome 12ql5. 

We confirmed the physical linkage of the MDM2 and 
IFNG loci by screening, with the MDM2 PCR (11), the 
contig of YAC clones recognized by at least one of the 
D12S335, DI2S313, and IFNG microsatellites. The 
MDM2 marker recognized three YAC clones (751A4, 



870H3, and 983H8) that were also recognized by the 
D12S335, D12S313, and IFNG microsatellites. Thus, 
the MDM2 and IFNG loci are physically linked. The 
MDM2 gene, which codes for a p534ike protein, is lo- 
cated in a region associated with several cancers (15, 
16). New polymorphic markers for this region are of 
particular interest since the only one available so far 
is a NZalV polymorphic restriction site (11). 

The Myf6 locus is located in the region of mouse chro- 
mosome 10 syntenic to human chromosome band I2ql5 
(4). Therefore, we decided to localize the MYF-6 locus 
physically. We screened the CEPH YAC library using 
MYF-6-specific primers (forward 5'-AGACCTTCTCC- 
ACGCAGCAG-3' and reverse 5 '-GCGAAATCTGTTG- 
TGCAGCT-3 ') under PCR conditions identical to those 
described above for the MDM2 marker. Two clones, 
921C6 and 982A6, were isolated. We found that both 
belonged to a contig of nine other YAC clones, according 
to inter-Aiu PCR patterns described in the CEPH- 
G4n$thon WWW server (6). Three of them, clones 
940B8, 937D9, and 949H4, contained the MYF-6 
marker. According to the CEPH-Genethon server, 
clones 940B8 and 937D9 contained the D12S106 micro- 
satellite, a marker 15 cM from the D12S313 microsatel- 
lite, toward the telomere (10). Figure 3 shows the posi- 
tion of these various markers on human chromosome 
12ql5 and mouse chromosome 10. The organization of 
the syntenic regions is very similar. IFNG/D12S313 
and D12S106 are 15 cM apart, whereas Ifg/Mdml and 
MyfS are 5 cM apart. However, since 1 cM is, on aver- 
age, equivalent to 1.7 Mb in the mouse and 1.0 Mb in 
human, the physical distances between the markers in 
mouse and in human could be of the same order. 

In conclusion, we have shown that the IFNG and the 
MDM2 genes are close to each other and to the 
D12S335 and D12S313 microsatellites. These markers 
are most likely localized to chromosome bands 
12ql4.3-ql5, in a region syntenic to the telomeric part 
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FIG. 2. Segregation of the alleles of Ifg, Mdml. MyfS, and some markers published by the Whitehead Institute (7) I in 78 Fl (SJUJ x 
BIO S> x BIO S mice The Ifs and Mdml markers were described in Bureau et ai (5). A microsatelUte was found in the second intron oi 
^Jd^^Z^SSi primers (forward 5 '-GAAAGGGCACTGGGCTGTAC-3 ' and reverse S'-CGCCGGATHWCTGITGCT- 
,3') (13). Black squares represent homozygous mice; white squares represent heterozygous mice. Numbers under each column represent Uie 
number of mice with each genotype. 
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FIG. 3. Genetic maps of the syntenic regions of human chromo- 
some I2ql5 and mouse chromosome 10. The MDM2 and MYF6 genes 
were localized by physical mapping only. The other human genes 
and markers were localized by both physical and genetic mapping. 

of mouse chromosome 10. This information will be ex- 
tremely useful in human genetic studies in which the 
IFNG and MDM2 genes will be candidate genes. 

ACKNOWLEDGMENTS 

We than* C. Dib and J. Weissenbach for their help; C. Petit for 
helpful discussion; and M. Gau for preparing the manuscript. This 
work was supported by grants from the Insfcitut Pasteur Fonda tion, 
the ARSEP, the CNRS, the GREG, and the NMSS. 

REFERENCES 

1. Albertsen, K M„ Abderrahim, H., Cann, H. M., Daussct, J., Le 
Paslier, D., and Cohen, D. (1990). Construction and character- 
ization of a yeast artificial chromosome library containing seven 
haploid human genome equivalents. Proc. Natl Acad. Set USA 
87: 4256-4260. 

2. Ashar, H. R, Benson, K. F, Jenkins, N. A., Gilbert, D. J., Cope- 
land. N. G.. and Chada. K. K. (1994). Ifg, Gli. Mdrol, Mdm2, 
and Mdm3: Candidate genes for the mouse pg locus. Uamm. 
Genome 5: 608-611. 

3. Awata, T. ( Matsumoto, C, Urakarni, T., Hagura, R., Amemiya, 
S.. and Kanazawa, Y. (1994). Association of polymorphism in 
the interferon gamma gene with IDDM. Diabetologia 37: 1159- 
1162. 

4. Braun, T., Bober, B., Winter, B., Rosenthal, N., and Arnold, 
H. H. (1990). Myf-6, a new member of the human gene family 



of myogenic determination factors: Evidence for a gene cluster 
on chromosome 12. BMBO J. 9: 821-831. 

5. Bureau, J. P., Montagutelli, X., Bihi, F., Lefebvre, S., Guenet, 
J. L., and Brahic. M. (1993). Mapping loci influencing the persis- 
tence of Thciler's virus in the murine central nervous system. 
Nature Genet. 5: 87-91. 

6. Cohen, D., Chumakov, L, and Weissenbach, J. (1993). A first- 
generation physical map of the human genome. Nature 366: 
698-701. 

7. Dietrich, W. F., Miller, J. C., Steen, R. G., Merchant, M. t Dam- 
ron, D., Nahf, R., Gross, A., Joyce, D. C, Wessel, M., Dredge, 
R. D., Marquis, A., Stein, L. D., Goodman, N., Page, D. C, and 
Lander, E. S. (1994). A genetic map of the mouse with 4006 
simple sequence length polymorphisms. Nature Genet 7: 220- 
225. 

8. Fiette, L.. Aubert, C, Muller, U., Huang, S., Aguet, M., and 
Brahic, M. (1995). Theiler's virus infection of 129Sv mice that 
lack the inferferon alpha/beta or interferon gamma receptors. 
J. Exp. Med. 181. 

9. Francke. U. (1994). Digitized and differentially shaded human 
chromosome ideograms for genomic applications. Cytogenet 
Celt Genet 65: 206-219. 

10. Gyapay. G., Morissette, J., Vignal, A., Dib, C, Fizames, C, 
Millasseau, P., Marc, S., Bemardi. G., Lathrop, M., and Weis- 
senbach. J. (1994). The 1993-94 Genethon human genetic link- 
age map. Nature Genet 7: 246-339. 

11 Heighway. J.. Mitchell, E. L. D., Jones, D., White, G. R M.. 
and Santibanez Koref, M. F. (1994). A transcribed polymer, 
phism and sub-localisation of MDM2. Hum. Genet 93: 611- 
612. 

12. Isaacs, A.. Lindcnmsnn, J., and Valentine, R- C. (1957). Virus 
interference. IL Some properties of interferon. Proa R. See 
London B 147: 268-273. 

13. Miner, J. H., and Wold. B. (1990). Herculin, a fourth member 
of the myoD family of muscle regulatory genes. Proc Natl. Acad. 
Sci USA 87: 1089-1093. 

14. Naylor, S. L., Sakaguchi, A. Y. f Shows, T. B., Law, M. L-, Goed- 
del, D. V., and Gray, P. W. (1983). Human immune interferon 
gene is located on chromosome 12. «/. Exp. Med. 57: 1020-1027. 

15. Oliner. J. D„ Kinzler, K. W„ Meltzer, P. S„ George, D. L-, and 
Vogelstein, B. (1992). Amplification of a gene encoding a p53- 
associated protein in human sarcomas. Nature 358: 80-83. 

16. Reifenberger, G., Lui, L., Ichimura, K., Schmidt, E. E., and 
Collins, V. P. (1993). Amplification and overexpression of the 
MDM2 gene in a subset of human malignant gliomas without 
p53 mutations. Cancer Res. 533: 2736-2739. 

17. Ruiz-Linares, A. (1993). Di nucleotide repeat polymorphism in 
the Interferon.gamma (IFNG) gene. Hum, Mol Genet 2; 1508. 

18. Taylor, B. A., Rowo, L., and Grieco, D. (1992). Close linkage of 
Mdm-l, a gene amplified and overexpressed in a transformed 
3T3 cell line, with gamma interferon (Ifg) on Chromosome 10 
of the mouse. Mamm. Genome 3: 700-704. 

19. Trent, J. M., Olson, S., and Lawn, R. M. (1982). Chromosomal 
localization of human leukocyte, fibroblast, and immune inter- 
feron genes by means of in situ hybridization. Proc. Natl Acad. 
Sci. USA 79: 7809-7813. 



EXHIBIT 3 



JR-10.003-US 



PATENT 




IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



re Application of: Ryan 
Serial No.: 10/608,463 
Filed: June 27, 2003 



Group Art Unit: 1652 
Examiner: E. Slobodyansky 



FOR: ISOLATED GENOMIC POLYNUCLEOTIDE FRAGMENTS FROM 
CHROMOSOME 12 THAT ENCODE HUMAN CARBOXYPEPTIDASE M AND THE 
HUMAN MOUSE DOUBLE MINUTE 2 HOMOLOG 

Confirmation No.: 6428 



INFORMATION DISCLOSURE STATEMENT 

Commissioner for Patents 

P.O. Box 1450 

Alexandria, VA 22313-1450 



Sir: 

In accordance with 37 C.F.R. 1.56, 1.97 and 1.98, Applicants submit herewith 
references which they believe may be material to the patentability of this application and 
with respect to which there may be a duty to disclose in accordance with 37 C.F.R 1.56. 

While the references may be "material" under 37 C.F.R. 1.56, it is not intended to 
constitute an admission that the references are "prior art" unless specifically designated as 
such. 

The filing of this Information Disclosure Statement shall not be construed as a 
representation that no other material references than those listed exist or that a search has 
been conducted. 

The references are listed in PTO form 1449 which is in accordance with the 
requirements of M.RE.P. 609. A copy of the references is also enclosed. 

The references are as follows: 
U.S. Patent Documents 

03/OV2005 EAREGftYi 00000022 10(,0S4b3 

01 FC:i80S mM » 



Foreign Patent Documents 



Other Documents 

!. RIES eta!., 2000, Cell 103:321-330 

2. REHLI et al., 1995, J. Biol. Chem. 270:15644-9 

3. OLINER et al., 1992, Nature 358: 80-3 

4. TAN et al., 1989, J. Biol. Chem. 264: 13165-70 

It is respectfully requested that these references be considered by the Patent and 
Trademark Office in its examination of the above-identified application and be made of 
record therein. The Examiner is also invited to contact the Undersigned if there are any 
questions concerning this paper or the attached references. 

The Information Disclosure Statement submitted herewith is being filed 
□ before the mailing date of a first Office Action on the merits 
[x J after the mailing date of a first Office Action on the merits. Please charge 
the fee of $180 to credit card. Form PTO-2038 is attached. 

[] An International Search Report is enclosed. References cited in the 
International Search report are asterisked. 



Respectfully submitted, 





Cheryl (H. Agris, Reg. No. 34,086 
P.O. Box 806 
Pelham, N.Y. 10803 
(914)712-0093 




Fte&yalon Act of 1995, no persons are required 



PTO/SB/08A (04-03) 
Approved for use through 04/30/2003. OMB 0651-0031 
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE 
to respond to a collection of information unless it contains a valid OMB control number. 



*9/PTO 



Complete if Known 



Application Number 



INFORMATION DISCLOSURE 
STATEMENT BY APPLICANT 

(Uso os many shoots os necessary) 



Filing Date 



First Named Inventor 



Art Unit 



Examiner Name 



10/608,463 



June 27, 2003 



Ryan 



1652 



E. Siobodyansky 



Attorney Docket Number JR1 0003 



U. S. PATENT DOCUMENTS 


Examtnef 
initials* 


Cite 
No.' 


Document Number 


Pubiication Date 
MM-OO-YYYY 


Name of Patentee or 
Applicant of Cited Document 


Pages. Columns. Lines. Where 
Relevant Passages or Relevant 
Figures Appear 


Number-Kind Code 2 "*" 0 "" 1 ' 






US- 












US- 












US- 












US- 












US- 












US- 












us- 












US- 












US- 












us- 












us- 












us- 












us- 












us- 












us- 












us- 












us- 












us- 












us- 









FOREIGN PATENT DOCUMENTS 


Examiner 
Initials' 


Cite 
No.* 


Foreign Patent Document 


Publication 
Date 
MM-DD-YYYY 


Name of Patentee or 
Applicant of Cited Document 


Pages, Columns, Linos, 
Where Relevant Passages 
Or Relevant Figures Appear 


T* 


Country CooV " Number * ~Kind Cod©' {* known) 


























L_ 



























































( Examiner 




Date 




1 Signature 


Infti-il U ufAmnfa r-nnciriarnA luhnlhor «r nrtl ^rtatinn l« in rftnfftfmanrj* U/ifh MP FP fiDQ DfS 


Considered 

w lin« (hrnuoh < 





tAnrntll fc.r-»- " isiviwimu , . _ . w ■ — ' *~ - — w - . 

considered. Include copy of this form with next communication to applicant Applicant's unique citation designation number (optional). See Kinds Codes of 
USPTO Patent Documents at www.usoto.oov or MPEP 901.04. * Enter Office that Issued tho document, by the two-letter code (WIPO Standard ST.3). * For 
Japanese patent documents, the indication of the year of the reign of tha Emperor must precede the serial number of the patent document. 9 Kind of document by 
tho appropriate symbols as indicated on the document under WIPO Standard ST. 16 If possible. * Applicant Is to place a check mark here if English language 
Translation is attached. 

This collection of information is required by 37 CFR 1.97 and 1.98. The information is required to obtain or retain a benefit by the public which is to file (and by tho 
USPTO to process) an application. Confidentiality Is governed by 35 U.S.C. 122 and 37 CFR 1.14. This collection is estimated to take 2 hours to complete. 
Including gathering, preparing, and submitting tho completed application fo/m to the USPTO. Time will vary depending upon the individual case. Any comments 
on the amount of time you require to complete this form and/or suggestions for reducing this burden, should be sent to the Chief Information Officer, U.S. Potent 
and Trademark Office. U.S. Department of Commerce, Washington. DC 20231. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. SEND 
TO: Commissioner lor Patents, Washington, DC 20231. 

If you need assistance in completing the form, call U8QQ-PTO-9199 ( 1-800-786-9 199) end select option 2. 




PTO/SB/088 (04-03) 
Approved for use through 04/30/2003. OMB 0651-0031 
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE 
Act of 199S. no persons are required to respond to a collection of information unless it contains a valid OMB control number. 



Complete if Known 



Application Number 



INFORMATION DISCLOSURE 
STATEMENT BY APPLICANT 

(Uso as many shoots as nocessary) 



Filing Data 



First Named Inventor 



Art Unit 



Examiner Name 



10/608,463 



June 27, 2003 



Ryan 



1652 



E. Slobodyansky 



iSheet 



Attorney Docket Number 



JR 10003 



NON PATENT LITERATURE DOCUMENTS 



Examiner 
Initials* 



Cite 
No. 1 



Include name of the author (in CAPITAL LETTERS), title of the article (when appropriate), title of 
the item (book, magazine, journal, serial, symposium, catalog, etc.), date, page{s), volume-issue 
number(s), publisher, city and/or country where published. 



RIES et al., 2000, Cell 103: 321-330. 



REHLI et al., 1995, J. Biol. Chem. 270: 15644-15649. 



OLINER et al., 1992, Nature 358:80-83. 



TAN el al., 1989. J. Biol. Chem. 264: 13165-13170. 



Examiner 




Date 




Signature 




Considered 





considered. Include copy of this form with next communication to applicant. 

1 Applicant* s unique citation designation number (optional). 2 Applicant is to place a check mark here if English language Translation is attached. 
This collection ot information is required by 37 CFR 1 .98. The Information Is required to obtain or retain a benefit by the public which Is to file (and by the USPTO 
to process) an application. Confidentiality is governed by 35 U.S.C. 122 end 37 CFR 1 . 14. This collection is estimated to take 1 20 minutes to complete, including 
gathering, preparing, and submitting the completed application form to the USPTO. Time wiD vary depending upon the individual caso. Any comments on tho 
amount of time you require to compiete this form and/or suggestions for reducing this burden, should be sent to the Chief Information Officer. U.S. Patent and 
Trademark Office. U.S. Department ol Commerco. Washington. DC 20231. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. SEND TO; 
Commissioner for Patents, Washington, DC 20231. 



If you need assistance in completing the form, ca/7 1>800-PTO*9199 (1-800-786*9199) and select option 2. 



EXHIBIT 4 




Available online at www.sciencedirect.com _ 

GENE 

iDIRECf 



AM 1KTCI»H*TlO»4AL JOUHHAL OM 



ELSEVIER Gene 338 (2004) 217- 223 



www.cIscvicr.com/locatc/genc 



Genomic organisation of the human MDM2 oncogene and 
relationship to its alternatively spliced mRNAs* 

Huiling Liang aJ , Helen Atkins 3 , Rana Abdel-Fattah a , Stephen N. Jones b , John Lunec 3 '* 

■ Cancer Research Unit. Medical School University of Newcastle upon Tyne. Newcastle upon Tyne NE2 4HH, UK 
b University of Massachusetts Medical School. 55 Lake Avenue North, Worcester. MA 01655. USA 

Received 26 January 2004; received in revised form 26 April 2004; accepted 1 7 May 2004 
Available online 21 July 2004 
Received by A J. van Wijnen 



Abstract 

The MDM2 proto-oncogenc, which encodes a protein that binds to the p53 tumour suppressor, has been found amplified and overexpressed 
in a range of human tumours. Although the human MDM2 cDNA sequence has been reported, the genomic organisation of the human gene has 
not been documented. Wc have previously reported the detection of five alternative internally deleted MDM2 transcripts in human tumours and 
suggested these may represent alternatively spliced forms. Here we demonstrate two novel MDM2 transcripts with internal deletions, using RT- 
PCR followed by sequencing. To definitively ascribe these variant transcript forms to alternative splicing, and to explore associated 
mechanisms, wc have determined the intron-exon organisation of the human genomic sequence. The human MDM2 gene spans approximately 
33 kb and is divided into 12 exons. Exon sizes range from 50 to £ 1 161 bp and intron sizes vary from 121 to - 7000 bp. The positions of 
intron-exon boundaries are compared with the deletion junctions of the multiple-sized transcripts and discussed in relation to alternative 
splicing mechanism. 

© 2004 Elsevier B.V. All rights reserved. 
Keywords: p53; genomic mapping; long range PCR 



1. Introduction 

The mdml proto-oncogenc was initially identified as an 
amplified gene from a mouse double minute chromosome 
present in a spontaneously transformed Balb/C 3T3 cell 
line, 3T3DM CHaines et al., 1994; Sigalas et a1. t 1996; 
Stcinman et ah, 2004)/ The causal role of this gene in 
tumorigenesis was originally established by transfection 
studies using genomic DNA sequences. In these studies, 



Abbreviations: MDM2, mouse double minute 2 gene; RT-PCR, reverse 
transcription polymerase chain reaction; cDNA, complementary DNA; kb, 
kilobase; bp, base pair, TE, trts(hydroxymethyl)aminomcihanc ethylene 
diamine tetra acetate. 

* Database accession numbers: AF 1 440 I4-AF 144033 and AF201370- 
AF20137I. 
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Medical School, University of Newcastle upon Tyne, Newcastle upon 
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Lake Avenue North, Worcester, MA 01655, USA. 
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experimental overexpression of mdm2 resulted in the 
immortalisation of primary rat embryo fibroblasts and 
induced a fully transformed phenotype in the cells when 
cotransfected with an activated ras gene (Finlay, 1993). The 
human homologue of the MDM2 gene has been found to be 
amplified in over 30% of human sarcomas (Oiiner et a!., 
1992; Leach et al., 1993), which consequently results in 
high levels of the MDM2 gene product. In addition, MDM2 
overexpression can also occur through enhanced transcrip- 
tion and translation (Bueso-Ramos et al., 1993; Landers et 
al., 1997; Momand et al., 1998). 

The human MDM2 gene has been localised to chromo- 
some 12ql3-14. Although the human MDM2 cDNA se- 
quence has been previously reported (Oiiner et al., 1992), 
little is known about its genomic organisation. The MDM2 
protein is composed of 491 amino acids and contains a p53 
binding domain (codons 19-102), a putative nuclear local- 
isation signal (codons 181-185), an acidic domain (codons 
223-274), a central zinc-finger motif (codons 305-332) and 
a ring-finger motif towards the C-terminal end of the protein 
(codons 438-478) (Boddy et al., 1994). 
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MDM2 appears Co be a pluripotential oncoprotein, exert- 
ing its transforming properties through several alternative 
mechanisms, of which the most extensively studied has 
been the negative regulation of p53 function. MDM2 blocks 
p53 transcriptional function by binding to p53 (Momand ct 
ah, 1992; Oliner et al., 1993). The binding of MDM2 to p53 
also results in the rapid degradation of p53 (Haupt et al., 
1997; Kubbutat et al., 1997). In addition, MDM2 has been 
reported to have p53-indcpendent tumorigenic properties. 
This includes the ability to interact with and inactivate the 
pRb tumour suppressor protein (Xiao et al., 1995) and to 
bind to and activate the E2F1 transcription factor (Martin et 
al., 1995). Furthermore, two independent transgenic studies 
have shown MDM2 to have tumourigenic properties in p53 
null mice (Lundgren et a!., 1997; Jones et al., 1998). 

One of the distinctive properties of MDM2 is the pos- 
session of an extremely complex expression pattern. Its 
multiple-sized transcripts and proteins have been found in 
tumour samples and cell lines by a number of groups 
(Haines et al., 1994; Sigalas et al., 1996; Bartel ct aJ., 
2002). In our previous studies, five alternatively sized 
transcripts of the human MDM2 were found in human 
ovarian rumour, bladder tumour and leukaemic cell samples 
(Sigalas et al., 1996). The expression of the alternatively 
sized forms was found to be more frequent in tumours of 
advanced stage and high histological grade, and they also 
retained their ability to transform NIH3T3 cells. Here, we 
present data demonstrating two further MDM2 transcript 
forms with internal sequence deletions in human rumour 
tissue. We hypothesised that these transcripts are generated 
by alternative splicing. To lest this hypothesis and to explore 
the associated mechanisms, we have investigated the ge- 
nomic structure and organisation of the human MDM2 gene. 
This gene is - 33 kb in length and comprises at least 12 
exons. The sizes of exons vary from 50 to ^1161 bp, and 
introns range in size from 121 to — 7000 bp. The position 
of intron-exon boundaries is compared with the sequences 
of the MDM2 variant transcripts and discussed in relation to 
alternative splicing mechanisms. 



2. Materials and methods 

2. L Nested RT-PCH 

Total RNA was extracted from human bladder tumour and 
normal bladder tissues. Nested RT-PCR was carried out as 
previously described (Sigalas et al, 1996). 

2.2. Genomic DNA extraction 

Genomic DNA was prepared from frozen normal hu- 
man placental tissue by digestion with proteinase K and 
phenol -chloroform extraction. The DNA was precipitated 
with a half volume of 7.5M ammonium acetate and one 
volume of isopropanol, washed in 70% ethanol and 
resuspended in 1 x TE bufTer (10 mM Tris, 0.1 mM 
EDTA pH 7.5). 

2.3. Long-range PCR 

Primers (Table 1) were designed from the published 
MDM2 cDNA sequence (Oliner ct al., 1992). Each primer 
pair was designed to cross the deletion junctions of 
multiple-sized transcripts (Sigalas et al., 1996) or accord- 
ing to the predicted exon/intron boundaries by referring to 
the mouse mdm2 gene structure (Jones et al., 1996; 
Monies de Oca luna et al., 1996). A long-range PCR 
protocol was carried out, with the above human genomic 
DNA as a template, using an XL PCR Kit (Perkin Elmer, 
Part No. N808-192). For comparison, PCR was also 
carried out on normal human placental cDNA. The reac- 
tion contained 1 x reaction buffer, 0.8 mM dNTP, 1.1 mM 
Mg(OAc) 2 and 4 units of xTth DNA polymerase, 40 pmol 
of each primer and 100 ng of genomic DNA or 20 
ul cDNA in a total volume of 100 ul. The long-range 
PCR was performed using a thermal cycler (Perkin-Elmer 
Model 480) as follows: 94 °C for 2 min; cycles 1-16 at 
94 °C for 30 s, 58-62 °C for 10 min; cycles 17-28 at 94 
X for 30 s, 58-62 °C for 10 min and 15 s of increment 



Tabic i 

The sequences of primer pairs for amplification of genomic nucleotides of the human MDM2 gene 



Primer Forward primer cDNA position Reverse primer Position al cDNA Annealing 
name (5' - 30 (nt)» (S'-OQ W_ temperature (°C) 



Pril cctgtgtggccctgtgtgtc 30-49 tgttccgaagctggaatctgtg 332-361 

Pri2 gtgcaataccaacatgtctg 314-333 caacagactttaataacttcaaaagc 435-410 

Pn3 gctmgaagttattaaagtctgttg 410-435 tacaatatgttgttgcnctcatc 536-513 

PH4 mtatcttggccagtatanatg 474-497 aticctgctgattgactactacc 654-632 

Pri5 catgatctacaggaacttggtag 614-636 ngatcactcccaccttcaagg 716-695 

Pri6 actcaggtacatctgtgagigag 661-683 tgtctcactaattgctctccttc 815-793 

Pri7 ugtacaagagcttcaggaagag 724-746 atggcgtccctgtagaacac 969-949 

PriS tgaaagcctggctctgtgtg 893-912 caaattctacactaaactgatctg 1059-1036 

Pri9 tcitgatgctggtgtaagtgaac 980-1002 agctaaggaaatttcaggatcttc 1211-1188 

PnIO gtgatacagattcatttgaagaag I16S-I191 catctgagcaatgtgatggaag 1277-1254 

Prill ctattggaaatgcacttcatgc 1213-1235 cggtggctcatgcctgtaatc 2372-2351 



• Sequence of MDM2 cDNA clones (Oliner ct a!., 1 992). 
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per cycle, with a final extension of 72 
the last cycle. 

PCR products were separated by electrophoresis on a 1 % 
low melting temperature agarose gel (NuSieve GTG, Flow- 
gen) and visualised by ethidium bromide staining with UV 
transillumination. DNA bands amplified from genomic 
DNA were excised and purified with a QIAqutck Gel 
Extraction kit (Qiagen), by following the procedure recom- 
mended by the company. 

2 A. Cloning PCR products 

Purified PCR products were subcloned directly into the 
pGEM-T Easy vector (Promcga) following the protocol 
recommended by the company. Ligation products were 
transformed into Escherichia coli JM109 and clones con- 
taining the desired inserts were identified by PCR screening. 
Plasmids were prepared by using the Wizard Plus SV 
Miniprep system (Promega). 

2.5. Sequencing 

Sequencing was carried out manually by using PCR 
product directly as a template or automatically by using 
ptasmid PCR product clone as a template. Manual sequenc- 
ing was performed using the Sequenase Version 2.0 DNA 
sequencing system (Amersham, Product No.70770). The 
automated sequencing was carried out in the central core 
facility at the University of Newcastle upon Tyne Medical 
Faculty. The sequences were aligned with the published 
MDM2 cDNA sequence, using the DNASTAR sequence 
analysis software package. 



3. Results 

3.1. The detection of multiple-sized MDM2 transcripts in 
human tumours 

In our previous studies, we have found five alternative- 
sized MDM2 transcripts {MDM2~a, -6, -c, -d and -e) 
(Sigalas ct al. f 1996). Our present investigation of the 
MDM2 transcriptional pattern in human bladder tumour 
samples, but not in normal bladder tissue (data not shown), 
using RT-PCR has revealed two further transcripts, sized 
813 and 707 bp, which we have designated MDM2-al and 
-g (Fig- 1). Sequencing shows that these two transcripts 
have internal sequence deletions: MDM2-al lacks nucleo- 
tides from codons 28 to 222 and codons 275 to 300; 
MDM2-g misses nucleotide codons from 28 to 97 / 3 and 
U4 a /j to 300. Fig. 2 shows the structure of these tran- 
scripts in relation to the full-length MDMl cDNA se- 
quence and previously described variant transcripts (see 
GenBank accession numbers AF201370 and AF201371 for 
details of the sequences). 



Ikb— 



0,5 kb 




Fig. t. MDM2 transcripts were amplified by RT-PCR with nested primers 
thai Hank the MDMl coding region, as previously described (Sigalas cl a!., 
19%; Matsumoto ct al„ 1998). M: molecular weight maker. 

3.2. MDM2 genomic structure and organisation 

We have used long-range PCR amplification, followed by 
cloning and sequencing to investigate the organisation of the 
human MDM2 gene and in particular to define intron-exon 
boundaries and flanking intronic sequences. Eleven DNA 
fragments were amplified from genomic DNA with the 
primer pairs shown in Table I, which match to the known 
MDMl cDNA sequences. Comparison of the sequences of 
these PCR products with the published sequences of MDM2 
cDNA clones reveals that the Pri 1 primer pair spans introns 1 
and 2; while primer pairs Pri2 - 1 0 cover one intron per primer 
pair. However, the region flanked by primer pair Prill, 
covering from the 1235th to 2351st nucleotide of the 
MDMl cDNA clone sequence (Oliner et al, 1992), was 
found not to contain any intronic sequence. Sequence anal- 
ysis indicates that MDM2 spans approximately 33 kb of 
genomic DNA and is separated by II introns. Exons range 
in size from 50 to £ 1161 bp. The size of the introns varies 
from 121 to - 7000 bp (Table 2). Exon-intron boundary 
sequences of the 5' and 3' splice sites follow the U GT and 
AG" rule (Table 3; see GenBank accession numbers 
AF1440I4 AF144033 for additional intronic sequence data). 
Table 4 shows the 3' ends of the intronic sequences adjacent 
to the intron-exon boundaries, including branch sites and 
polypyrimidine tracts. The sequences of branch sites have a 
good match with the consensus sequence YURA*Y (Y: 
pyrimidine; R: purine; A*: branch point residue). The dis- 
tances between the branch points and the 3' splice sites vary 
from IS to 111 bp. The C/T content in the polypyrimidine 
tracts ranges from 53% to 90%. 

Comparison of the structure and organisation of the human 
MDM2 gene described here with that published for the mouse 
gene (Jones el al, 1996) indicates that the number of the 
exons and introns is the same. The size of the coding exons is 
similar (Table 2). However, the sizes of the noncoding trans- 
cribed regions, including exons 1, 2 and 12 and the introns, 
differ substantially, with the exception of introns 1 and 3. 

3.3. Analysis of alternatively spliced variants of MDM2 
mRNA 

We have detected seven MDM2 transcript variants 
(MDM2-a, -al, -b t -c, -4 -e and -g) previously and 
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Tabic 3 

Sequence of the human MDM2 exon/intron boundaries 



Exon no. 



CXQO no. 
1 


GGAGCAGgtgctggc - - intron I- - tttcccagCTCTuT* 


2 


2 


GATCCAGgtaagcac— intron 2--ccttgtagGCAAAjg 


3 


3 


GACCOfigttagtat- -intron 3--tcttatagGTrAGAC 
27 28 


4 


4 


GAAAg&Sgtaagctg- -intron 4--tatttcagGJ3CTTT 
52 " 


s 


s 


AG£A£AGgtaattct--intron 5--cctacaagGAA£ATA 
96 98 


6 


6 


CAGeAGGgtaagtta - - intron 6 - - tctctcagAAl£ATC 

113 »»s 


7 


7 


TCAAaaggtaateea-- intron 7--aegcctagSBCCTTG 
136 »" 


8 


8 


CAGACACotatatat-- Intron 8--atatccagAR6fifiAA 
167 >» 


9 


9 


GAATQCSgcoacgtt-- intron 9--tgttttagGAJCTTG 
222 223 


10 


10 


TGATSRGgtatatat--intron 10-tttattagQTATATc 
274 275 


11 


11 


CTTAGdgtaagtat- - intron ll-cattgaagGA£TATT 
3Q0 301 


12 



underneath). 



exon 6, has a shifted reading frame after codon 28. The ■ 4. Discussion 
MDM2-e variant, which also involves a deletion junction 

with an interrupted codon sequence, has a shift in the In our previous studies and the results presented here. 

Sing Me after codon 484 we detected multipk-sized MDM2 tnmsenpts m human 
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>40 59 
78 



'exon 4 



>40 53 

ttgatggaiatgtngctg^gggcciaiagntiggBai^uag^ttueuiu^^v-b^^..^ a >4 0 go.5 

cTaacatttagttaicUtaatgctcagaatcatarttgtatucag/cxon 5 >4Q 

naaiacaaatttttanctaaaatgucatctcngttaimmtttttctgtc 6 <40 90 

ctgatccrattctmctctcag/cxon 7 
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8 ^mugaactattttattgaaac^ 9 ^ ^ 

9 gtgatgmatcaaauatatatttuttcngtmag/exori 10 <40 53 

10 ctaatgaaigtgtmattag/cxon II <40 61 

11 ctgactgtgtgtcttamcaKgaag/cxon 12 . — 

Branch site sequences are indicated with underlined letters. /: boundary between intron and exon. 

' The percentage of T and C content in polypyrimidine tracts. 
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tumour tissues, but not in normal tissues. Detection of 
alternative spliced forms of MDM2 mRNAs varied, and 
appeared to be relatively abundant in most samples. 
These variants encoded protein products in vitro and 
were found to transform NIH3T3 cells and to be associ- 
ated with high-grade and late-stage human cancer (Sigalas 
et ah, 1996 and unpublished data). Our data were 
supported by the observations of other groups, who 
reported that alternatively spliced mdm2 transcripts pro- 
moted tumour formation in mouse model (Fridman et al., 
2003; Steinman et al., 2004). These collective observa- 
tions suggest that alternatively spliced forms of MDM2, 
encoding alternative proteins with differing functional 
capabilities, may play an important role in tumour devel- 
opment The human MDM2 genomic map presented here 
enable us to relate the genomic structure and organisation 
of MDM2 to the appearance of variant transcript forms 
and provide a basis for considering alternative splicing 
mechanisms. 

Pre-mRNA splicing involves precise excision of intron 
sequences and the ligation of exon sequences. In cases of 
alternative splicing, the excision may occur at cryptic splice 
sites; exons may be skipped and introns may be retained. 
The organisation of the exon- intron boundaries of the 
human MDM2 gene indicates that the MDM2-a, ~al, -b t -c 
and -g variant transcript forms (Fig. 2) result from multiple 
entire exon skipping, because the internal deletion junc- 
tions correspond exactly to the location of exon-intron 
boundaries. However, the deletion junctions of MDM2*d 
and -e forms do not correspond to the boundaries between 
exons and introns, and there are no consensus splicing 
sequences surrounding them to indicate the possible use of 
cryptic splice sites. This suggests that they may ha\'e 
resulted from an unusual and possibly aberrant splicing 
mechanism. 

The regulation of alternative splicing involves both cis 
elements and transuding factors. The cis elements in- 
clude (he 5' and 3' splice sites, a branch site and a 
polypyrimidine tract between the branch point and the 3' 
splice site. It has been demonstrated that a short distance 
between the branch point and the 3' splice site and high 
C/T content at polypyrimidine tracts give rise to high 
efficiency of splicing in mammals (Helfman et al., 1988; 
Libri et aL, 1989). Our data show that all the 5' and 3' 
splice sites of the human MDM2 gene obey the "GT f * 
and "AG' 1 rule. The branch sites also have a good match 
with the consensus sequence. However, the distances 
between branch points and 3' sites, and the percentage 
C/T content vary between introns. The exons most 
frequently retained in the splice variants have shorter 
distances between their upstream branch points and 3' 
splice sites and/or a higher percentage C/T content in 
their upstream polypyrimidine tracts, compared with those 
exons commonly excluded (Table 2). It suggests that the 
short distance between the branch point to the 3' site and 
the polypyrimidine tract with high C/T content confer 



high splicing efficiency to their adjacent splicing sites, 
which is consistent with the observation reported by 
Smith et al. (1989), Goux-Pelletan et al. (1990), Mueller 
ct al. (1997). However, exon 7 was spliced out with a 
high frequency although there is only a short distance 
between its branch point and 3' splice site and there is 
also a high C/T content in its upstream polypyrimidine 
tract. The explanation may be that it is spliced out along 
with upstream exons. The order of intron removal is 
governed by preferential binding of splice factors rather 
than in a sequential numerical order (Lewin, 1994). Jt 
may be possible that exon 7 processes splicing before 
exon 6 does, and the 5' end of exon 7 may in some 
circumstances be ligated preferentially to the 3' end of 
exon 6. If exon 6 has low splice strength, then in the 
process of ligation with its upstream exon, it may be out- 
competed and skipped out together with exon 7. 

Currently, we are not clear why the alternative spliced 
transcripts appear preferentially in tumour samples, espe- 
cially in advanced stage and high grade, but not in normal 
tissues (Sigalas et al., 1996; Bartel et al., 2002). Although 
variant MDM2 spliced transcripts have been reported in 
normal tissues in one study (Bartel et al., 2004), we failed to 
detect these isoforms in noncancerous tissues. It has been 
proposed that a mRNA surveillance system exists in cells, 
which protects them from errors of transcription, mRNA 
processing, or mRNA transport (Pulak and Anderson, 
1993). Mistakes are not uncommon in splicing of RNA 
from complex genes. Exons can occasionally be skipped 
(Nigro et al., 1991). In the normal situation, the surveillance 
system would probably degrade most mRNA with splicing 
errors as they are transported to the cytoplasm. We speculate 
that in cancer cells, this system may not function correctly 
and, consequently, the splice variants may escape degrada- 
tion. It is also possible that there are mutations in the intron 
region that cause alternative splicing and the presence of the 
variants contribute to the cancer. It would be of interest to 
test this hypothesis by investigating intron nucleotide 
sequences in tumour samples that show expression of 
alternatively spliced forms. However, other models can be 
envisaged; for instance, we cannot rule out the possibility 
that /raw-acting factors are involved in the alternative RNA 
processing by blocking some splice sites and/or enhancing 
other splice sites. 

In conclusion, we have detected two novel MDM2 
alternatively spliced transcripts and have also defined the 
structure and organisation of the human MDMl gene. In 
addition wc have related this information to potential 
mechanisms by which alternatively sized MDM2 transcripts 
arc generated. As the alternatively spliced MDM2 mRNAs 
have been shown to possess oncogenic potential and to 
correlate with advanced malignancies to tumour progress 
(Haines et al., 1994; Sigalas ct al., 1996; Steinman et al., in 
press), our data may assist in clinical diagnosis of sarcomas 
displaying MDM2 amplification and alternative splicing of 
MDM2 transcripts. 



H. Liang er al / Gene 338 (2004) 217-223 



223 



Acknowledgements 

We thank Ratchada Suaeyun for her assistance with the 
work presented in this paper. 

References 

Bartel, F., Taubcr, H., Harris, U 2002. Allcntivc and aberrant splicing of 
MDM2 mRNA in human cancer. Cancer Cell 2, 9- 15. 

Bartel. F., Pinkcrt, D., Fiedler. W., Kapplcr, M., Wurl. P., Schmidt, H.. 
Taubcrt, H., 2004. Expression of alternatively and aberrantly spliced 
transcripts of the MDM2 mRNA is not tumor-specific. Int J. Oncol. 24, 
143-151. 

Boddy. M.N., Frccmont, P.S., Borden, K.L., 1994. The p53-associated 

protein MDM2 contains a newly characterized zinc-binding domain 

called the RING finger. Trends Biochem. Sci. 19, 198- 199. 
Bucso-Ramos. C. Yang, Y., Dcleon, E.. McCown, P.. Stass, S., Albitar, M., 

1993. The human MDM-2 oncogene is ovcrcxprcsscd in icukemias. 

Blood 82, 2617-2623. 
Finlay, C.A., 1993. The mdm-2 oncegene can overcome wild-type p53 

suppression of transformed cell growth. Mol Cell. Biol. 13, 301 -306. 
Fridman, J., Hernando, E., Hemann, M., Stanchina, E., Cordon-Dardo, S., 

Lowe, S„ 2003. Tomor promotion by Mdm2 splice variants unable to 

bind p53. Cancer Res. 63. 5703 - 5706. 
Goux-Pellcian, M.. Libri, D.. d'Aubcnton-Carafa, Y., Fiszman, M., Brody, 

E., Marie, J.. 1990. In vitro splicing of mutually cxclosivc exons from 

the chichen B-tropomyosin gene: role of the branch point location and 

very long pyrimidinc stretch. EMBO J. 9, 241 -249. 
Haines, D.S., Landers, J.E., Engle, LJ. f George. D.L.. 1994. Physical and 

functional interaction between wild-type p53 and mdm2 proteins. Mol. 

Cell. Biol. 14, II7I-M78. 
Haupt, Y, Maya, R., Kazaz. A., Orcrt, M., 1997. Mdm2 promotes the rapid 

degradauon of p53. Nature 387. 296-299. 
Hclfman, D.M.. Ricci, W.M., Finn, LA., 1988. Alternative splicing of 

tropomyosin prc-mRNAs in vitro and in vivo. Genes Dev. 2 (12A), 

1627-1638. 

Jones, S.N., Ansari-Lari, M.A., Hancock, A.R.. Jones, W.J.. Gibbs, R.A.. 

Donehowcr, LA.. Bradley. A. f 1996. Genomic organization of the 

mouse double minute 2 gene. Gene 175, 209-213. 
Jones. S.N.. Hancock. A.R.. Vogcl, H., Donehowcr. L.A., Bradley, A.. 

1998. Overcxprcssion of mdm2 in mice reveals a p53-indcpcndent role 

for mdm2 in tumori genesis. Proc. Natl. Acad. Sci. U. S. A. 95, 

15608-15612. 

Kubbutat, M.H.. Jones. S.N., Vbusdcn, K.H.. 1997. Regulation of p53 
stability by Mdm2. Nature 387, 299-303. 

Landers, J.E., Cassel, S.L., George, D.L.. 1997. Translation^ enhancement 
of mdm2 oncogene expression in human tumor cells containing a sta- 
bilized wild-type p53 protein. Cancer Res. 57. 3562-3568. 

Leach, F.S.. Tokino. T., Meltzer. P.. Buircll, M., Olincr, J.D., Smith, S.. 
Hill. D.E., Sidransky, D., Kinzlcr, K.W„ Volgelstein. B. f 1993. p53 
Mutation and MDM2 amplification in human soft tissue sarcomas. 
Cancer Res. 53. 2231-2234. 



Lewin, B., 1994. The apparatus for nuclear splicing. In: Lwein, B. (Ed.)» 
Genes V. Oxford Univ. Press, Oxford, pp. 914 - 940. 

Libri, D., Marie, J., Brody, E.. Fiszman, M.Y., 1989. A subfragmcntof the 
beta tropomyosin gene is alternatively spliced when transfected into 
differentiating muscle cells. Nucleic Acids Res, 17, 6449 - 6462. 

Lundgren. 1C, Montcs de Oca Luna, R.. McNeill, Y.B.. Emcrick, E.P., 
Spencer, B., Barficld, C.R., Lozano, G., Rosenberg, M.P., Finlay, 
C-A., 1997. Targeted expression of MDM2 uncouples S phase from 
mitosis and inhibits mammary gland development independent of 
p53. Genes Dev. II, 714-725. 

Martin, K„ Trouche, D., Hagemcier, C, Sorensen, T.S., La Thanguc, N.B., 
Kouzarides, T., 1995. Stimulation of E2FI/DPI transcrtptjonal activity 
by MDM2 oncoprotein. Nature 375, 691-694. 

Matsumoto, R.. Tada, M., No2aki, M., Zhang, C.L., Sawamura, Y., Abe, 
H., 1998. Short alternative splice transcripts of the mdm2 oncogene 
correlate to malignancy in human astrocytic neoplasms. Cancer Res. 
58. 609-613. 

Momand, J., Jung, D., Wilczynski, S., Niland, J., 1998. The MDM2 gene 
amplification database. Nucleic Acids Res. 26, 3453-3459. 

Momand, J., Zambetti. G.P.. Olson, D.C., George. D.. Lcvinc, A J., 
1992. The mdm-2 oncogene product forms a complex with the 
p53 protein and inhibits p53-mediatcd transactivation. Cell 69, 
1237-1245. 

Monies dc Oca luna, R., Tabor, A.D., Eberspaecher, H., Hulbor, D.L., 
Worth. L.L., Colman, M.S., Finlay, C.A., Lozano, G., 1996. The orga- 
nization and expression of the mdm2 gene. Genomics 33, 352-357. 

Mueller, A., Odzc, R.. Jenkins, T.D., Shahsesfaei. A. Nakagawa, H., Cno- 
moto, T., Rustgi, A.K., 1997. A transgenic mouse model with cyclic DI 
overcxprcssion results in cell cycle, epidermal growth factor receptor, 
and p53 abnormalities. Cancer Res, 57, 5542-5549. 

Ntgro, J.M.. Cho. K.R.. Pearson. ER.. Kern, S.E., Ruppert. J.M., Oliver, 
J.D., Kinzler. D.W., Vogclstein. B., 1991. Scrambled exons. Cell 64, 
607-613. 

Olincr, JX>., K-inzIer, K.W., Meltzer, P.S., George, D.L., Vbgetstein, B ; , 
1992. Amplification of a gene encoding a pS3-associated protein in 
human sarcomas [see comments). Nature 358, 80-83. 

Olincr, J.D., Pietcnpol, J.A., Thiagalingam, S., Gyuris, J.. Kinzlcr, K.W., 
Vogclstein, B., 1993. Oncoprotein MDM2 conceals the activation do- 
main of tumour suppressor p53. Nature 362, 857-860. 

Pulak. R., Anderson, P., 1993. mRNA surveillance by the Caenorhabditis 
clegans smg genes. Genes Dev. 7, 1 885- 1 897. 

Sigalas, L, Calvert, A.H., Anderson, JJ.. Neal, D.E.. Luncc, J., 1996. 
Alternatively spliced mdm2 transcripts with loss of p53 binding domain 
sequences: transforming ability and frequent detection in human cancer. 
NaL Med. 2,912-917. 

Smith, C.WJ., Porro, E.B., Patton. J.G., Nadal-Ginard, B., 1989. Scanning 
from an independently specified branch point defines the 3? splice site 
of mammalian introns. Nature 342, 243-247. 

Stcinman, H„ Burstein. E„ Ungner, C, Gosselin, J., Pihan, G., Duckett, 
C, Jones, S., 2004. An alternative splice form of Mdm2 induces p53- 
independent cell growth and tumorigenesis. J. Biol. Cbem. 279, 
4877-4886. 

Xiao, ZX, Chen, J„ Lcvine, A J., Modjtahedi, N., Xing, J„ Sellers, W.R., 
Livingston, D.M., 1995. Interaction between the retinoblastoma protein 
and the oncoprotein MDM2. Nature 375, 694 -698. 



RELATED PROCEEDINGS APPENDIX 

None 



EXHIBIT 2 



Downloaded from www.genome.org on December 13, 2007 - Published by Cold Spring Harbor Laboratory Press 




Is "Junk" DNA Mostly Intron DNA? 

Gane Ka-Shu Wong, Douglas A. Passey, Ying-zong Huang, Zhiyong Yang and Jun Yu 

Genome Res. 2000 10: 1672-1678 

Access the most recent version at doi:10.1101/gr.148900 



References 



Email alerting 
service 



This article cites 20 articles, 7 of which can be accessed free at: 
http^/www.genome.org/cgi/content/full/1 0/1 1/1672#References 

Article cited in: 

httpy/www.genome.org/cgi/content/f ull/1 0/1 1/1672#otherarticIes 

Receive free email alerts when new articles cite this article - sign up in the box at the 
top right corner of the article or click here 



Notes 



To subscribe to Genome Research go to: 
http://www.genome.org/subscriptions/ 



© 2000 Cold Spring Harbor Laboratory Press 



Downloaded from www.genome.org on December 13, 2007 - Published by Cold Spring Harbor Laboratory Press 



First Glimpses/Reporfan^ ^ - ...s^,-,,^^^^^.^ eesses ••■ ■ 

Is "Junk" DNA Mostly Intron DNA? 

Gane Ka-Shu Wong, 1 - 3 Douglas A. Passey, 1 Ying-zong Huang, 1 Zhiyong Yang/ and 
Jun Yu 1 - 2 

' Human Genome Center, Department of Medicine, University of Washington, Seattle, Washington 98195, USA; 2 Human 
Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China 

Among higher eukaryores, very lirtle of the genome codes for protein. What is In the rest of the genome, or 
the "junk" DNA, that, in Homo sapiens, is estimated to be almost 97% of the genome? Is it possible that much of 
this "|unk" is intron DNA? This Is not a question rhat can be answered lust by looking at the published data, 
even from the finished genomes. One cannot assume that there are no genes in a sequenced region, just because 
no genes were annotated. We introduce another approach to this problem, based on an analysis of the 
cDNA-to-genomic alignments, in all of the complete or nearly-complete genomes from the multicellular 
organisms. Our conclusion is that, in animals but not in plants, most of the "Junk" is intron DNA. 



Among higher eukaryotes, very little of the genome 
codes for protein. What is in the rest of the genome, or 
the "junk" DNA, that, in Homo sapiens, is estimated to 
be almost 97% of the genome? If a region is gene-poor, 
is that because there are vast deserts of intergenic DNA 
between adjacent genes, or is that because the few 
genes that are there are large, with enormous introns? 

First, a few definitions are needed. We consider 
only the euchromatic portion of the genome. The het- 
erochromatic portion (e.g., centromeres and telo- 
meres) is highly repetitive and largely devoid of genes. 
It is extremely difficult to clone, extremely polymor- 
phic, and unlikely to be sequenced correctly anytime 
soon. We define the exons and introns as "intragenic" 
and everything else as "intergenic." This is not to im- 
ply that intergenic DNA is nonfunctional, especially as 
we have incorporated the promoters into our defini- 
tion. However, promoters are difficult to identify, 
whereas exons and introns are reliably identified by 
cDNA-to-genomic alignments. Lastly, we will use the 
term "genomic length" to indicate the sum of the ex- 
ons and introns in a given gene and "cDNA length" to 
indicate the sum of only the exons. 

Even after a genome is completely sequenced, it is 
not a straightforward matter to determine the inter- 
genic fraction. Indeed, any assessment that is based 
only on the fraction of the genome that has not been 
identified by the gene annotations is likely to be an 
overestimate of the underlying reality. Consider how 
the genes are annotated. Most current procedures (The 
Caenorhabditis elegans Sequencing Consortium 1998; 
Dunham et al. 1999; Lin et al. 1999; Mayer et al. 1999; 
Adams et al. 2000; Hattori et al. 2000) employ a com- 
bination of EST/cDNA/ protein alignments and ab ini- 
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tio exon-prediction programs. Given the incomplete 
state of the EST/cDN A/protein data, most of the anno- 
tated exons are in fact based on the exon-prediction 
programs, even if parts of certain genes are confirmed 
by the experimental data. There are two problems (Bur- 
set and Guigo 1996; Reese et al. 2000). One is that the 
exon-prediction programs cannot identify untrans- 
lated non-coding exons (i.e., the UTRs). The second, 
more important, issue is that these programs are not 
particularly proficient at identifying large genes. There 
are three reasons: (1 ) The signal-to-noise ratio can be as 
low as 1/1000, for the extreme case of a 100-bp exon 
juxtaposed next to a 100-kb intron; (2) the data sets 
used to train these programs tend to under- represent 
the large difficult-to-sequence genes; and (3) the 
codon-usage statistics, by which the exons are initially 
identified, are not as informative for the large genes of 
certain organisms (Wright 1990). 

The extent of the large-genes problem is organism 
dependent. The determinant is the distribution of ge- 
nomic lengths. If the genomic lengths are distributed 
over many orders of magnitude, then failure to anno- 
tate even a small fraction of the largest genes will leave 
a much larger fraction of the genome unannotated. In 
this scenario, there is a critical difference between the 
following two seemingly similar quantities: the frac- 
tion of the genes in the genome that is correctly iden- 
tified and the fraction of the genome sequence that is 
labeled as intragenic. The first quantity is far more 
likely to be correct than the second. It is possible that 
the total gene count is essentially correct, while, at the 
same time, the intragenic fraction is significantly un- 
derestimated and the intergenic fraction is signifi- 
cantly overestimated. Indeed, this is precisely the prob- 
lem for the animal genomes. 

Our solution is to determine the distribution of 
genomic lengths entirely from cDNA-to-genomic 
alignments (i.e., independent of the exon-prediction 
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programs). Then, compare the mean genomic length 
to the mean gene-to-gene distance. The former is taken 
from the cDNA alignments, but the latter is computed 
as the ratio of the euchromatic genome size, divided by 
the gene count, taken from the annotations. Reliable 
results are expected for Drosophila melanogaster and 
Caenorhabditis elegans, because genome sequencing for 
these organisms is complete and estimates of the gene- 
to-gene distance are available. For Arabidopsis thaliana, 
the published chromosomes (Lin et al. 1999; Mayer et 
al. 1999) agree to 4.5%, so we can safely extrapolate to 
the entire genome. In contrast, for H, sapiens, the pub- 
lished chromosomes (Dunham et al. 1999; Hattori et 
al. 2000) differ by 243%, reflecting the heterogeneity 
in the gene densities of warm-blooded vertebrates (Ber- 
nard* 2000). Coupled with the difficulties of determin- 
ing the mean genomic length, a result of the lack of 



large genomic contigs, we refer extensively to the 
model organism results to guide our interpretations of 
the H. sapiens data. 

RESULTS 

Figure 1 depicts the distribution of genomic lengths for 
H. sapiens, D. melanogaster, C. elegans „ and A. thaliana. 
Table 1 is a numerical summary. The animal distribu- 
tions span 2-3 orders of magnitude, but the plant dis- 
tribution spans only one order of magnitude. The im- 
plication for the large-genes problem can be estimated 
by considering how many of the largest genes would 
have to be unidentified for half of the intragenic space 
to be missing. The figures range from 1 1% and 10% at 
one extreme, in H. sapiens and D. melanogaster, to 30% 
at the other extreme, in A. thaliana. Furthermore, the 
only organism in which the intergenic fraction is 
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Figure 1 Distribution of genomic lengths for (o) Homo sapiens, (b) Drosophila melanogaster, (c) Caenorhabditis elegans, and (d) 
Arabidopsis thaliana. Dark shading indicates strong hits. Weak hits (lightly shaded) represent cDNA-to-genomic alignments with <3 exons 
or <50% of the cDNA length aligned. An overwhelming majority of these weak hits are actually complete alignments with only one or 
two exons. Instances in which <50% of the cDNA is aligned represent 7.3%, 3.3%, 1.2%, and 0.9% of the genes in the four organisms, 
respectively. 
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Table 1. Estimated Intergenic Fractions 



Drosophlla Caenorhabotltis Arabldosis 

Homo sapiens melanogaster elegans thallana 



Euchromatin 3180000 123000 97800 130000 

Sequenced DNA 369000 123000 91000 119000 

Gene-to-gene 45.4 9.0 5.3 4.7 

cDNA aligned 1061 1628 583 1401 

Genomic quality 1.2 23.3 2.4 15.7 

Nested gene j 6% 8% 4% 1% 

05 Percentile 2.5 0.9 0.8 0.9 

Genomic length 43.4 9.5 5.0 2.6 

95 Percentile 165.5 36.3 14.2 5.4 

%, missing half 11% 10% 21% 30% 

Intergenic DNA Discussed in 3% 10% 46% 
text of article 



The first three rows list the euchromatic genome size, the amount of genomic sequence that was analyzed, 
and the annotation-based estimate of the gene-to-gene distance. The next three rows describe the cDNA 
alignments. These rows list the number of aligned cDNAs, our quality assessment for the genomic contigs (i.e., 
the median of the genomic contig size divided by the genomic length for the 95th-percentile gene), and our 
estimate of the frequency of nested genes (i.e., genes on the reverse strand or inside an intron). The genomic 
length is given in the next three rows by its arithmetic mean, and its 5th or 95th percentile values. Next/we 
indicate what fraction of the largest genes would have to be unidentified for half of the intragenic space to be 
missing. The last row lists the intergenic fraction, computed by correcting the mean genomic length for nested 
genes, dividing that by the mean gene-to-gene distance, and subtracting the result from one. Note: In 
Drosophlla melanogaster, we do not count scaffold joins longer than 1 kb as contiguous when computing the 
genomic quality. All lengths are reported in kp. 



greater than 10% is A. thallana, even though we have 
included the minor correction for nested genes (genes 
on the reverse strand or inside an intron). This correc- 
tion is computed by counting the occurrences of 
nested genes in our cDNA alignments, and adjusting 
for the fact that we do not detect every such occurrence 
because we do not have all of the cDNAs. 

The main uncertainty in our method is that we 
must extrapolate from a subset of the genes to the en- 
tire genome to determine the mean genomic length. 
There will be sampling biases, but they can be catego- 
rized and subcategorized as follows: (1) the extent to 
which cDNA data are enriched for large or small genes, 
(2) the extent to which genomic data are biased for 
large or small genes, and then, are the gene-rich re- 
gions done first by sequencing projects? Are the con- 
tigs large enough for us to align the large genes? 

We will argue that the problem is primarily in the 
genomic data, not the cDNA data. Furthermore, to the 
extent that there are sampling biases, the tendencies 
are always to underestimate the mean genomic length 
and to overestimate the intergenic fraction. 

There are two reasons to suspect that biases in the 
cDNA data will cause us to underestimate the mean 
genomic length. Keep in mind that large genes are 
highly correlated with large cDNAs (this paper; data 
not shown). The first explanation is that full-length 
cDNAs are extremely difficult to clone, given the ease 
with which RNA molecules are degraded and the in- 
trinsic bias in the cloning system for smaller inserts. 



The second reason is that large RNA molecules require 
more time to transcribe, so large genes might be less 
highly expressed and more difficult to isolate. How- 
ever, this expectation is incorrect, because the tran- 
scription machinery operates in parallel. As a measure 
of the expression levels, in H. sapiens, we aligned the 
1,856,102 ESTs in GenBank against our cDNA data. 
tMultiple reads from the same clone were counted only 
once. Figure 2 shows that there is no significant varia- 
tion in EST coverage as a function of genomic length. 
Notice that the normalization procedures (Hillier et al. 
1996) applied to the EST libraries do not affect the rare 
transcripts, in which we were looking for an effect. The 
conclusion is that cDNA data, extracted from Gen- 
Bank, can be representative of all genomic lengths. 

Genomic data are biased in two ways. First, there is 
a sociologic bias toward sequencing gene-rich regions 
first. Second, even when a genome is complete, lack of 
long-range contiguity, on the scale of the largest genes, 
will reduce the estimate of the mean genomic length, 
because any breaks in the alignment are most likely to 
occur across the largest introns. Both issues are rel- 
evant in the H. sapiens data. In Figure 3, we demon- 
strate that the aligned data are biased toward GC-rich 
genes, which are of smaller genomic lengths (Bernard i 
2000). As for contiguity, we estimate the extent of the 
problem by computing the ratio of the median ge- 
nomic contig size to the genomic length of the 95th 
percentile gene. Ideally, this ratio would be much 
greater than one. Table 1 shows that it is much greater 
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average 39.9 unique ESTs per cDNA 
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Figure 2 is the collection of Homo sapiens cDNA sequence biased? We aligned 
the 1,856,102 ESTs in CenBank to our cDNA sequences and plotted the number 
of aligned ESTs as a function of the genomic length. Multiple reads from the same 
done are counted only once. There is no obvious bias, indicating that cDNAs for 
genes of every genomic length are equally easy to isolate. 



than one in D. melanogaster and A. thaliana. It is only 
moderately greater than one in C. elegans, but that is 
less important for this organism, because the genomic 
lengths are not as broadly distributed. 
However, in H. sapiens, the ratio is 1.2, and 
it would have been even smaller had we 
not used genomic data from a new division 
of Gen Bank in which all of the overlapping 
clones have been joined together 0a rig e t 
al. 1999). 

We can estimate the severity of these 
biases with the different versions of the D. 
mekmogaster genomic data. Specifically, we 
repeated the alignments with the same 
cDNA data but switched to the 34.9 Mb of 
finished clone-by-clone genomic data that 
was available prior to the completion of the 
whole-genome shotgun (Adams et al. 
2000). The contig quality measure is then 
2.8, and the resultant mean genomic 
length of 7.1 kb is off the mark by 34%. By 
comparing those cDNAs aligned in both 
data sets, we find that 16% of this effect is 
attributable to the contiguity problem. The 
other 18% is attributable to the bias toward 
sequencing gene-rich regions first. An even 
more dramatic example of these biases is 
Mas imtsatlus, which has a contig quality 
measure of 0.3 and a mean genomic length 



of 9.7 kb. If we assume that there is no dif- 
ference between M. muscttlus and H. sapi- 
ens, this estimate is off the mark by 447%. 
Parenthetically, another unreliable way to 
estimate the mean genomic length is to ex- 
tract GenBank annotations. The annotated 
genes in that 34.9 Mb of genomic data for 
D. inelanogaster have a mean genomic 
length of 3.0 kb, which is off the mark by 
317%. 

The essential conclusion is that our 
43.4 kb figure for the mean genomic length 
in H. sapiens is a substantial underestimate, 
even if it is already 10 times larger than the 
training sets used for these exon-prediction 
programs. However, the gene count itself is 
also uncertain. The traditional estimate of 
70,000 (Antequera and Bird 1993; Fields et 
a!. 1994) has recently been challenged by 
substantially lower estimates, from 35,000 
to 45,000 (1-wing and Green 2000; Hattori 
et al. 2000; Roest Crollius et al. 2000). How 
can we interpret the H. sapiens data? If we 
accept the traditional gene count of 70,000, 
our mean genomic length of 43.4 kb pre- 
dicts an imergenic fraction of 10%. Sup- 
pose we inflate our estimate by the same 
34% discrepancy that was observed between the two D. 
melanogaster data sets. The gene count that would be 
consistent with the same 10% intergenic fraction is 

Number of cDNA: fraction aligned = 0.146 
0.4r 




0.5 

cDNA's GC 

Figure 3 Is the collection of Homo sapiens genomic sequence biased? We com- 
puted the probability that cDNAs of a particular GC content aligned to genomic 
seqence, given that only 369 Mb of nonredundant finished genomic sequence 
were available. The solid line (on an arbitrary scale) indicates the initial collection 
of cDNAs. The obvious bias toward GC-rich cDNAs is important because these are 
known to correspond to smaller genes (Bernardi 2000). Dark shading shows 
strong hits; light shading shows weak hits. 
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then 51,400. Considering that the contig quality 
is much worse in H\ sapiens than in the clone-by- 
clone D. melanogaster data, it is likely that the 
mean genomic length is underestimated by 
>34%. Thus, the gene count would have to be 
substantially less than the current low estimates 
of 35,000 to 45,000 for our arguments to allow 
much intergenic DNA. 

Given the uncertainty in our method, we 
cannot give a precise estimate for the intergenic 
fraction in H. sapiens. However, we are prepared 
to argue that the intergenic fraction in /-/. sapiens 
cannot be as large as it is for A. thaliana, because, 
at such a high intergenic fraction, the distribu- 
tion of GC content for genomic DNA is bimodai, 
as in Figure 4. Fitting the data to a sum of Gaus- 
sians reveals that the main mode is centered at 
0382, which is almost identical to the 0.390 GC 
content of the aligned A. thaliana genes. The rela- 
tive ratio of the two modes implies an intergenic 
fraction of 30%, which is smaller than the 46% 
estimate derived from genomic length argu- 
ments but not unexpectedly so, because some of 
the intergenic DNA could have a GC content that 
is similar to the intragenic DNA. The reason why 
this bimodai ity has not been reported previously 
is that it is extremely sensitive to how the data 
are plotted. Specifically, the histogram bins must 
be smaller than the mean genomic length, and 
smaller genomic contigs (i.e., those sequenced 
because they contain a likely gene) cannot be 
used. That said, no such bimodality is observed 
in H. sapiens, D. melanogaster, or C. elegans, re- 
gardless of how the data are plotted. 

DISCUSSION 

So why do most genome annotation efforts con- 
tinue to report so much intergenic DNA? One of 
the most conspicuous features of the recent an- 
notations for H. sapiens chromosomes 21 and 22 
is the small handful of megabase-sized regions 



Figure 4 Distribution of CC content for anonymous ge- 
nomic sequence in Arabidopsis thaliana. The idea that a 
significant fraction of the genome is Intergenic, coupled 
with the fact that intergenic DNA has a lower GC content 
than intragenic DNA, suggests that this distribution will be 
bimodai. However, the bimodality is easily obscured by 
how the data are plotted, a and b differ in the size of the 
bins over which the CC content is computed, 1 kb and 5 
kb, respectively. Bin sizes larger than the average gene size 
of 2.6 kb obscure the effect because every bin is likely to 
contain a mixture of intragenic and intergenic DNA. a and 
c differ in the genomic contigs that are plotted (every * 
contig or only contigs <35 kb, respectively). By removing 
the large-insert clones favored by the genome centers, 
what is left behind are those sequences that were analyzed 
only because they contain a likely gene. Hence, the bimo- 
dality disappears. 
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with absolutely no annotated genes. In all likelihood, 
each of these regions has one or more large genes, with 
no counterpart in the EST/cDN A/ protein data and 
which are not being detected by the exon- predict! on 
programs. After accounting for large genes, the remain- 
der of the presently unannotated regions will likely be 
attributed to untranslated non-coding exons and 
flanking introns. We must reiterate that the fraction of 
the genes that is missing does not have to be large to 
explain away most of the unannotated regions. 

What is important is not the precise intergenic 
fraction or the precise gene count but, at the risk of 
extrapolating from a limited number of genomes, the 
differences between plants and animals. There is evi- 
dence that plant and animal genomes are organized in 
different ways. In H. sapiens, large genes are caused by 
a combination of large introns and more introns per 
gene (this paper; data not shown). At least 35.4% of the 
total length of the introns in our /-/. sapiens data is due 
to interspersed repeats (e.g., Alu and LI). The taie frac- 
tion is undoubtedly greater, as older repeats, whose 
sequences are >50% diverged from the ancestral con- 
sensus, cannot be identified by existing methods (Smit 
1996). Analysis of orthologous genes in Fugn rnbripes 
and H. sapiens reveals that much of the 10-fold differ- 
ence in the sizes of these two genomes can be ex- 
plained by differences in intron sizes (Elgaretal. 1996). 
In contrast, analysis of syntenic loci among grasses re- 
veals that much of the 40- fold difference in the size of 
these genomes can be explained by their extensive re- 
peat-filled intergenic regions (SanMiguel et al. 1996; 
Bennetzen et al. 1998). 

The conclusion is that, in animals, most repeats 
integrate into intron DNA, but, in plants, most repeats 
integrate into intergenic DNA. Is there something dif- 
ferent about the nature of the repeats that insert into 
animals and plants? Does this dichotomy reflect differ- 
ences in the operation of the introns and promoters? 
The answers to these questions will be critical for our 
understanding of the evolution of large-scale genome 
features. 

METHODS 

In H. sapiens, cDNA data were extracted from Gen Bank release 
1 12, but genomic data were downloaded, at the same time, 
from the new division for nonredundant joined-contigs (J an g 
et a I. 1999). In O. melanogastcr, cDNA data were taken from 
release IIS (Dec/ 15/1 999), but genomic data were taken from 
the whole-genome shotgun (Adams et al. 2000). In C. ek$ans 
and A. tha liana, both cDNA and genomic data were extracted 
from release 115. 

For the cDNA-to-genomic alignments, we required a 98% 
base pair agreement. We scanned the intron sequences for the 
consensus splice sites, GT-AG, but we also accepted as a sub- 
stitute GC-AG, albeit, in <1% of the data. Weak hits, defined 
as those with <3 exons or <50% of the cDNA length aligned, 
were plotted separately to verify that they were not anoma- 



lous. Immune system-related cDNAs (i.e., with the descriptors 
immunoglobin, Ig, HLA, MHC, V-region, etc.) were removed. 
Other redundancies were eliminated, up front by removing all 
cDNAs that are 90% contained in some other cDNA and post- 
alignment by comparing the genomic coordinates of the 
aligned exons. Raw genomic lengths were extrapolated to 
compensate for incomplete alignments-^ small correction 
even for H. sapiens, where a total of 86% of the cDNA lengths 
was aligned. As another quality control, we required that the 
exact coordinates of the coding region (i.e., the open reading 
frame) be known, even though it reduced the number of 
genes in our final data set. 

The partial alignment correction is done by computing 
an adjusted number of exons, N exon , with a linear extrapola- 
tion. The adjusted genomic length, l^nom* = N«xon<U«o«> + 
(Ncxon - l)<Limron>» IS extrapolated in a similarly linear man- 
ner, with the averages <U xon > and <L um on> being defined on 
a per gene basis. Because noncoding terminal exons are gen- 
erally larger than coding interior exons, both extrapolations 
are only performed across the coding portion of the cDNA 
sequence. The intention is to ensure that, if anything, we 
underestimate the mean genomic length. 
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